VLDB Features
I'm starting work on next projects for 8.4.
Many applications have the need to store very large data volumes for
both archival and analysis. The analytic databases are commonly known as
Data Warehouses, though there isn't a common term for large archival
data stores. The use cases for those can often be blurred and many
people see those as only one use case. My initial interest is in the
large archival data stores.
One of the main issues to be faced is simply data maintenance and
management. Loading, deleting, vacuuming data all takes time. Those
issues relate mainly to the size of the data store rather than any
particular workload, so I'm calling that set of required features "Very
Large Database" (or VLDB) features.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Details of those will be covered in separate mails over next few weeks
and months. So just to let everybody know that's where I'm headed, so
you see the big picture with me.
I'll be working on other projects as well, many of which I've listed
here: http://developer.postgresql.org/index.php/Simon_Riggs%
27_Development_Projects I expect the list is too long to complete for
8.4, but I'm allowing for various issues arising during development.
So specific discussion on other mails as they arrive, please.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance features
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance featuresJust so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
What do you mean by fault-tolerant here ?
Just
COPY ... WITH ERRORS TO ...
or something more advanced, like bulkload which can be continued after
crash ?
--------------
Hannu
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Simon.
VLDB Features I'm expecting to work on are
- Read Only Tables/WORM tables
- Advanced Partitioning
- Compression
plus related performance featuresJust so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
Not lost sight of it; I have a design, but I have to prioritise also.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hannu,
COPY ... WITH ERRORS TO ...
Yeah, that's a start.
or something more advanced, like bulkload which can be continued after
crash ?
Well, we could also use a loader which automatically parallelized, but that
functionality can be done at the middleware level. WITH ERRORS is the
most critical part.
Here's the other VLDB features we're missing:
Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)
Simon, we should start a VLDB-Postgres developer wiki page.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.
I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.
-Neil
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Windowing Functions
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
On-disk Bitmap Index (anyone game to finish GP patch?)
I would call those VLDB Data Warehousing features to differentiate
between that and the use of VLDBs for other purposes.
I'd add Materialized View support in the planner, as well as saying its
more important than parallel query, IMHO. MVs are to DW what indexes are
to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq
scan or you can avoid it. Brute force is cool, but being smarter is even
better.
The reason they don't normally show up high on anybody's feature list is
that the TPC benchmarks specifically disallow them, which as I once
observed is very good support for them being a useful feature in
practice. (Oracle originally brought out MV support as a way of
improving their TPC scores at a time when Teradata was wiping the floor
with parallel query implementation).
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Tue, 11 Dec 2007, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load. Unfortunately, I don't know anyone
who's working on it.
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg,
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
Because pgloader is implemented in middleware, it carries a very high overhead
if you have bad rows. As little as 1% bad rows will slow down loading by 20%
due to retries.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Hi,
Le mercredi 12 décembre 2007, Josh Berkus a écrit :
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/Because pgloader is implemented in middleware, it carries a very high
overhead if you have bad rows. As little as 1% bad rows will slow down
loading by 20% due to retries.
Not that much, in fact, I'd say.
pgloader allows its user to configure how large a COPY buffer to use (global
parameter as of now, could easily be a per-section configuration knob, just
didn't see any need for this yet).
It's the 'copy_every' parameter as seen on the man page here:
http://pgloader.projects.postgresql.org/#toc4
pgloader will obviously prepare a in-memory buffer of copy_every tuples to
give to COPY, and in case of error will cut it and retry. Classic dichotomy
approach, from initial implementation by Jan Wieck.
So you can easily balance the error recovery costs against the COPY bulk size.
Note also that the overall loading time with pgloader is not scaling the same
as the COPY buffer size, the optimal choice depends on the dataset --- and
the data massaging pgloader has to make on it ---, and I've experienced best
results with 10000 and 15000 tuples buffers so far.
FYI, now the pgloader topic is on the table, the next items I think I'm gonna
develop for it are configurable behavior on errors tuples (load to another
table when pk error, e.g.), and some limited ddl-partioning support.
I'm playing with the idea for pgloader to be able to read some partitioning
schemes (parsing CHECK constraint on inherited tables) and load directly into
the right partitions.
That would of course be done only when configured this way, and if constraints
are misread it would only result in a lot more rejected rows than expected,
and you still can retry using your insert trigger instead of pgloader buggy
smartness.
Comments welcome, regards,
--
dim
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote:
Simon, we should start a VLDB-Postgres developer wiki page.
http://developer.postgresql.org/index.php/DataWarehousing
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hi,
Josh Berkus wrote:
Here's the other VLDB features we're missing:
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Windowing Functions
Isn't Gavin Sherry working on this? Haven't read anything from him lately...
Parallel Index Build (not sure how this works exactly, but it speeds Oracle
up considerably)
Sounds interesting *turs-away-to-google*
On-disk Bitmap Index (anyone game to finish GP patch?)
Anybody having an idea of what's missing there (besides good use cases,
which some people doubt)? Again: Gavin?
Simon, we should start a VLDB-Postgres developer wiki page.
Thanks, Simon, wiki page looks good!
Regards
Markus
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
I'd say implementing a separate I/O worker would be the first step towards
this; if we could avoid doing I/O in the same process/thread where we're
doing row parsing it would speed up large scans by 100%. I know Oracle does
this, and their large-table-I/O is 30-40% faster than ours despite having
less efficient storage.
Maybe Greenplum or EnterpriseDB will contribute something. ;-)
Windowing Functions
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...
Me neither. Swallowed by Greenplum and France.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Hi Josh,
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores and only one
concurrent user. You'd like to have 1 core doing I/O, and say 4-5 cores
dividing the scan and join processing into 4-5 chunks.
Ah, right, thank for enlightenment. Heck, I'm definitely too focused on
replication and distributed databases :-)
However, there's certainly a great deal of an intersection between
parallel processing on different machines and parallel processing on
multiple CPUs - especially considering NUMA architecture.
*comes-to-think-again*...
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...Me neither. Swallowed by Greenplum and France.
Hm.. good for him, I guess!
Regards
Markus
On Wed, Dec 12, 2007 at 08:26:16PM +0100, Markus Schiltknecht wrote:
Isn't Gavin Sherry working on this? Haven't read anything from him
lately...Me neither. Swallowed by Greenplum and France.
Hm.. good for him, I guess!
Yes, I'm around -- just extremely busy with a big release at Greenplum as
well as other Real Life stuff.
Thanks,
Gavin
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Greenplum as well as other Real Life stuff.
For those of us here who have no idea what you are talking about can
you define what "Real Life" is like?
Joshua D. Drake
- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHYEmLATb/zqfZUUQRAhHJAJ9GD5DPZOlyd9LiBUG5TENIjuTgSwCaAnsf
5vdCZatl+XqD5S0+zMV/Ltk=
=KyqY
-----END PGP SIGNATURE-----
"Josh Berkus" <josh@agliodbs.com> writes:
Markus,
Parallel Query
Uh.. this only makes sense in a distributed database, no? I've thought
about parallel querying on top of Postgres-R. Does it make sense
implementing some form of parallel querying apart from the distribution
or replication engine?
Yes, but not for the reasons Josh describes.
I'd say implementing a separate I/O worker would be the first step towards
this; if we could avoid doing I/O in the same process/thread where we're
doing row parsing it would speed up large scans by 100%. I know Oracle does
this, and their large-table-I/O is 30-40% faster than ours despite having
less efficient storage.
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads are directly tied.
Where Parallel query is useful is when you have queries that involve a
substantial amount of cpu resources, especially if you have a very fast I/O
system which can saturate the bandwidth to a single cpu.
So for example if you have a merge join which requires sorting both sides of
the query you could easily have subprocesses handle those sorts allowing you
to bring two processors to bear on the problem instead of being limited to a
single processor.
On Oracle Parallel Query goes great with partitioned tables. Their query
planner will almost always turn the partition scans into parallel scans and
use separate processors to scan different partitions.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Hello Gregory,
Gregory Stark wrote:
Oracle is using Direct I/O so they need the reader and writer threads to avoid
blocking on i/o all the time. We count on the OS doing readahead and buffering
our writes so we don't have to. Direct I/O and needing some way to do
asynchronous writes and reads are directly tied.
Yeah, except in cases where we can tell ahead non-sequential reads.
Which admittedly doesn't come up too frequently and can probably be
handled with posix_fadvice - as you are currently testing.
Where Parallel query is useful is when you have queries that involve a
substantial amount of cpu resources, especially if you have a very fast I/O
system which can saturate the bandwidth to a single cpu.
Full ACK, the very same applies to parallel querying on shared-nothing
clusters. Those can help if the bandwidth to all processing cores
together becomes the bottleneck (and the resulting data is relatively
small compared to the input data).
For example, Sun's UltraSparc T2 features only 8 PCIe lanes for those 8
cores, so you end up with 250 MiB/sec per core or about 32 MiB/sec per
thread on average. To be fair: their 10 Gig Ethernet ports don't go via
PCIe, so you get an additional 2x 1 GiB/sec for the complete chip. And
memory bandwidth looks a lot better: Sun claims 60+ GiB/sec, leaving
almost 8 GiB/sec per core or 1 GiB/sec per thread.
If my calculations for Intel are correct, a Quad Xeon with a 1.33 GHz
FSB has around 21 GiB/sec throughput to main memory, giving 5 GiB/sec
per core. (Why are these numbers so hard to find? It looks like Intel
deliberately obfuscates them with FSB MHz or Giga-transactions per sec
and the like.)
Regards
Markus
Ühel kenal päeval, T, 2007-12-11 kell 15:41, kirjutas Neil Conway:
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote:
Just so you don't lose sight of it, one of the biggest VLDB features we're
missing is fault-tolerant bulk load.I actually had to cook up a version of this for Truviso recently. I'll
take a look at submitting a cleaned-up implementation for 8.4.
How did you do it ?
Did you enchance COPY command or was it something completely new ?
-----------
Hannu
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.
-Neil
Neil Conway wrote:
On Fri, 2007-12-14 at 14:48 +0200, Hannu Krosing wrote:
How did you do it ?
Did you enchance COPY command or was it something completely new ?
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.
Ideally I think you would put the failing input line in another table,
or maybe another file. If a table, it would probably have to be as bytea.
cheers
andrew
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.
Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.
regards, tom lane
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.
One approach would be to essentially implement the pg_bulkloader
approach inside the backend. That is, begin by doing a subtransaction
for every k rows (with k = 1000, say). If you get any errors, then
either repeat the process with k/2 until you locate the individual
row(s) causing the trouble, or perhaps just immediately switch to k = 1.
Fairly ugly though, and would be quite slow for data sets with a high
proportion of erroneous data.
Another approach would be to distinguish between errors that require a
subtransaction to recover to a consistent state, and less serious errors
that don't have this requirement (e.g. invalid input to a data type
input function). If all the errors that we want to tolerate during a
bulk load fall into the latter category, we can do without
subtransactions.
-Neil
Neil Conway <neilc@samurai.com> writes:
One approach would be to essentially implement the pg_bulkloader
approach inside the backend. That is, begin by doing a subtransaction
for every k rows (with k = 1000, say). If you get any errors, then
either repeat the process with k/2 until you locate the individual
row(s) causing the trouble, or perhaps just immediately switch to k = 1.
Fairly ugly though, and would be quite slow for data sets with a high
proportion of erroneous data.
You could make it self-tuning, perhaps: initially, or after an error,
set k = 1, and increase k after a successful set of rows.
Another approach would be to distinguish between errors that require a
subtransaction to recover to a consistent state, and less serious errors
that don't have this requirement (e.g. invalid input to a data type
input function). If all the errors that we want to tolerate during a
bulk load fall into the latter category, we can do without
subtransactions.
I think such an approach is doomed to hopeless unreliability. There is
no concept of an error that doesn't require a transaction abort in the
system now, and that doesn't seem to me like something that can be
successfully bolted on after the fact. Also, there's a lot of
bookkeeping (eg buffer pins) that has to be cleaned up regardless of the
exact nature of the error, and all those mechanisms are hung off
transactions.
regards, tom lane
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.
You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This
would help with the 4G subtransaction limit. The cost to the ETL process is
that a simple rollback would not be guaranteed send the process back to it's
initial state. There are easy ways to deal with the rollback issue though.
A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY
option is selected then the COPY FROM can run without subtransactions and in
excess of the 4G per transaction limit. NO RETRY should be the default since
it preserves the legacy behavior of COPY FROM.
You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the
option of sending exceptions to a table since they are presumably malformed,
otherwise they would not be exceptions. (Users should re-process exception
files if they want an if good then table a else exception to table b ...)
EXCEPTIONS TO and NO RETRY would be mutually exclusive.
Show quoted text
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.
Tom,
I think such an approach is doomed to hopeless unreliability. There is
no concept of an error that doesn't require a transaction abort in the
system now, and that doesn't seem to me like something that can be
successfully bolted on after the fact. Also, there's a lot of
bookkeeping (eg buffer pins) that has to be cleaned up regardless of the
exact nature of the error, and all those mechanisms are hung off
transactions.
There's no way we can do a transactionless load, then? I'm thinking of the
load-into-new-partition which is a single pass/fail operation. Would
ignoring individual row errors in for this case still cause these kinds of
problems?
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
Josh Berkus <josh@agliodbs.com> writes:
There's no way we can do a transactionless load, then? I'm thinking of the
load-into-new-partition which is a single pass/fail operation. Would
ignoring individual row errors in for this case still cause these kinds of
problems?
Given that COPY fires triggers and runs CHECK constraints, there is no
part of the system that cannot be exercised during COPY. So I think
supposing that we can just deal with some simplified subset of reality
is mere folly.
regards, tom lane
Hi,
Another approach would be to distinguish between errors that require a
subtransaction to recover to a consistent state, and less serious errors
that don't have this requirement (e.g. invalid input to a data type
input function). If all the errors that we want to tolerate during a
bulk load fall into the latter category, we can do without
subtransactions.
I think errors which occur after we have done a fast_heap_insert of the
tuple generated from the current input row are the ones which would require
the subtransaction to recover. Examples could be unique/primary key
violation errors or FKey/triggers related errors. Any errors which occur
before doing the heap_insert should not require any recovery according to
me.
The overhead of having a subtransaction per row is a very valid concern. But
instead of using a per insert or a batch insert substraction, I am
thinking that we can start off a subtraction and continue it till we
encounter a failure. The moment an error is encountered, since we have the
offending (already in heap) tuple around, we can call a simple_heap_delete
on the same and commit (instead of aborting) this subtransaction after doing
some minor cleanup. This current input data row can also be logged into a
bad file. Recall that we need to only handle those errors in which the
simple_heap_insert is successful, but the index insertion or the after row
insert trigger causes an error. The rest of the load then can go ahead with
the start of a new subtransaction.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
NikhilS <nikkhils@gmail.com> writes:
Any errors which occur before doing the heap_insert should not require
any recovery according to me.
A sufficient (though far from all-encompassing) rejoinder to that is
"triggers and CHECK constraints can do anything".
The overhead of having a subtransaction per row is a very valid concern. But
instead of using a per insert or a batch insert substraction, I am
thinking that we can start off a subtraction and continue it till we
encounter a failure.
What of failures that occur only at (sub)transaction commit, such as
foreign key checks?
regards, tom lane
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.
I'd suggest doing everything at block level
- wrap each new block of data in a subtransaction
- apply data to the table block by block (can still work with FSM).
- apply indexes in bulk for each block, unique ones first.
That then gives you a limit of more than 500 trillion rows, which should
be enough for anyone.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Ühel kenal päeval, L, 2007-12-15 kell 01:12, kirjutas Tom Lane:
Josh Berkus <josh@agliodbs.com> writes:
There's no way we can do a transactionless load, then? I'm thinking of the
load-into-new-partition which is a single pass/fail operation. Would
ignoring individual row errors in for this case still cause these kinds of
problems?Given that COPY fires triggers and runs CHECK constraints, there is no
part of the system that cannot be exercised during COPY. So I think
supposing that we can just deal with some simplified subset of reality
is mere folly.
But can't we _define_ such a subset, where we can do a transactionless
load ?
I don't think that most DW/VLDB schemas fire complex triggers or custom
data-modifying functions inside CHECK's.
Then we could just run the remaining simple CHECK constraints ourselves
and not abort on non-check, but just log the rows ?
The COPY ... WITH ERRORS TO ... would essentially become a big
conditional RULE through which the incoming data is processed.
------------------
Hannu
On Saturday 2007-12-15 02:14, Simon Riggs wrote:
On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the
InputFunctionCall, but I guess you'd need a subtransaction to ensure
that you reset your state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.I'd suggest doing everything at block level
- wrap each new block of data in a subtransaction
- apply data to the table block by block (can still work with FSM).
- apply indexes in bulk for each block, unique ones first.That then gives you a limit of more than 500 trillion rows, which should
be enough for anyone.
Wouldn't it only give you more than 500T rows in the best case? If it hits a
bad row it has to back off and roll forward one row and one subtransaction at
a time for the failed block. So in the worst case, where there is at least
one exception row per block, I think you would still wind up with only a
capacity of 4G rows.
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/
For complicated ETL, I agree that using an external tool makes the most
sense. But I think there is still merit in adding support to COPY for
the simple case of trying to load a data file that has some corrupted,
invalid or duplicate records.
-Neil
On 16/12/2007, Neil Conway <neilc@samurai.com> wrote:
On Tue, 2007-12-11 at 19:11 -0500, Greg Smith wrote:
I'm curious what you feel is missing that pgloader doesn't fill that
requirement: http://pgfoundry.org/projects/pgloader/For complicated ETL, I agree that using an external tool makes the most
sense. But I think there is still merit in adding support to COPY for
the simple case of trying to load a data file that has some corrupted,
invalid or duplicate records.-Neil
Any simple enhancing of COPY is welcome. I lost lot of time with
repeated imports.
Regards
Pavel
Show quoted text
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Hannu Krosing <hannu@skype.net> writes:
But can't we _define_ such a subset, where we can do a transactionless
load ?
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY will tend to go through
just fine, and then you're still stuck trying to weed out bad data
without very good tools for it. The only errors we could really
separate out without subtransaction fencing are extremely trivial
ones like too many or too few fields on a line ... which can be
caught with a sed script.
regards, tom lane
Hi,
On Dec 15, 2007 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
NikhilS <nikkhils@gmail.com> writes:
Any errors which occur before doing the heap_insert should not require
any recovery according to me.A sufficient (though far from all-encompassing) rejoinder to that is
"triggers and CHECK constraints can do anything".The overhead of having a subtransaction per row is a very valid concern.
But
instead of using a per insert or a batch insert substraction, I am
thinking that we can start off a subtraction and continue it till we
encounter a failure.The moment an error is encountered, since we havethe offending >(already in heap) tuple around, we can call a
simple_heap_delete on the same and commit >(instead of aborting) this
subtransactionWhat of failures that occur only at (sub)transaction commit, such as
foreign key checks?
What if we identify and define a subset where we could do subtransactions
based COPY? The following could be supported:
* A subset of triggers and CHECK constraints which do not move the tuple
around. (Identifying this subset might be an issue though?)
* Primary/unique key indexes
As Hannu mentioned elsewhere in this thread, there should not be very many
instances of complex triggers/CHECKs around? And may be in those instances
(and also the foreign key checks case), the behaviour could default to use a
per-subtransaction-per-row or even the existing single transaction model?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
2007/12/16, Tom Lane <tgl@sss.pgh.pa.us>:
Hannu Krosing <hannu@skype.net> writes:
But can't we _define_ such a subset, where we can do a transactionless
load ?Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY will tend to go through
just fine, and then you're still stuck trying to weed out bad data
without very good tools for it. The only errors we could really
separate out without subtransaction fencing are extremely trivial
ones like too many or too few fields on a line ... which can be
caught with a sed script.
I have dump file. I would like to load it ASAP.
Constraints will be applied at the end, so any problem can be detected.
I would like it to be as direct as possible and as bulk as possibe - just
allocate pages and fill them with the data. Maybe it should be different
mode - single user or so. Right now I can save some IO - like turn off
fsync, but that is all :(
I got something like that:
http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results
I have no idea how to load single file in many threads, but... the point is
that it can be much faster that single-thread load - surprisingly - at
least for me.
--
Regards,
Michał Zaborowski (TeXXaS)
On Dec 12, 2007, at 1:26 PM, Markus Schiltknecht wrote:
Josh Berkus wrote:
Sure. Imagine you have a 5TB database on a machine with 8 cores
and only one concurrent user. You'd like to have 1 core doing I/
O, and say 4-5 cores dividing the scan and join processing into
4-5 chunks.Ah, right, thank for enlightenment. Heck, I'm definitely too
focused on replication and distributed databases :-)However, there's certainly a great deal of an intersection between
parallel processing on different machines and parallel processing
on multiple CPUs - especially considering NUMA architecture. *comes-
to-think-again*...
Except that doing something in-machine is often far simpler than
trying to go cross-machine, especially when that something is a
background reader.
Let's walk before we run. :)
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Attachments:
Tom,
Sure ... but you'll find that it's not large enough to be useful.
Once you remove all the interesting consistency checks such as
unique indexes and foreign keys, the COPY will tend to go through
just fine, and then you're still stuck trying to weed out bad data
without very good tools for it. The only errors we could really
separate out without subtransaction fencing are extremely trivial
ones like too many or too few fields on a line ... which can be
caught with a sed script.
Speaking as someone who did a LOT of DW load design only a couple years ago,
I'll say that the "special case" of no triggers, no constraint checks except
length, and type-safety check actually constitutes about 50% of DW bulk
loading. The only exception to that is unique indexes, which would normally
be included and would be the difficult thing.
Also, "special case bulk loading" would in fact give users of other types of
applications a lot more flexibility -- they could always load into a holding
table just to clean up the type safety issues and then merge into the real
table.
So I don't agree that the "load into new partition without dependancies" is
too much of a special case to be worth pursuing. It might be a bad idea for
other reasons, but not because it's too obscure.
--Josh
Trent Shipley <trent_shipley@qwest.net> writes:
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraint violations,
and rows containing columns where the data type's input function raises
an error. The last case is the only thing that would be a bit tricky to
implement, I think: you could use PG_TRY() around the InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This
would help with the 4G subtransaction limit. The cost to the ETL process is
that a simple rollback would not be guaranteed send the process back to it's
initial state. There are easy ways to deal with the rollback issue though.A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY
option is selected then the COPY FROM can run without subtransactions and in
excess of the 4G per transaction limit. NO RETRY should be the default since
it preserves the legacy behavior of COPY FROM.You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the
option of sending exceptions to a table since they are presumably malformed,
otherwise they would not be exceptions. (Users should re-process exception
files if they want an if good then table a else exception to table b ...)EXCEPTIONS TO and NO RETRY would be mutually exclusive.
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.
Hello,
Attached is a proof of concept patch for this TODO item. There is no
docs yet, I just wanted to know if approach is sane.
The added syntax is like the following:
COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work. The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.
Example in action:
postgres=# \d test_copy2
Table "public.test_copy2"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | integer |
postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 1: "1"
2
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 2: "2"
3
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 3: "3"
NOTICE: total exceptions ignored: 3
postgres=# \d test_copy1
Table "public.test_copy1"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not null
postgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=#
Limited performance testing shows no significant difference between
error-catching and plain code path. For example, timing
copy test_copy1 from program 'seq 1000000' [exceptions to stdout]
shows similar numbers with or without the added "exceptions to" clause.
Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds. Any
example of what would be not properly rolled back by just PG_TRY?
Happy hacking!
--
Alex
Attachments:
0001-poc-copy-from-.-exceptions-to.patchtext/x-diffDownload
>From 50f7ab0a503a0d61776add8a138abf2622fc6c35 Mon Sep 17 00:00:00 2001
From: Alex Shulgin <ash@commandprompt.com>
Date: Fri, 19 Dec 2014 18:21:31 +0300
Subject: [PATCH] POC: COPY FROM ... EXCEPTIONS TO
---
contrib/file_fdw/file_fdw.c | 4 +-
src/backend/commands/copy.c | 251 +++++++++++++++++++++++++++++---
src/backend/parser/gram.y | 26 +++-
src/bin/psql/common.c | 14 +-
src/bin/psql/copy.c | 119 ++++++++++++++-
src/bin/psql/settings.h | 1 +
src/bin/psql/startup.c | 1 +
src/bin/psql/tab-complete.c | 12 +-
src/include/commands/copy.h | 3 +-
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/interfaces/ecpg/preproc/ecpg.addons | 2 +-
12 files changed, 396 insertions(+), 39 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
new file mode 100644
index 5a4d5aa..0df02f7
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
*************** fileBeginForeignScan(ForeignScanState *n
*** 624,629 ****
--- 624,630 ----
cstate = BeginCopyFrom(node->ss.ss_currentRelation,
filename,
false,
+ NULL,
NIL,
options);
*************** fileReScanForeignScan(ForeignScanState *
*** 697,702 ****
--- 698,704 ----
festate->cstate = BeginCopyFrom(node->ss.ss_currentRelation,
festate->filename,
false,
+ NULL,
NIL,
festate->options);
}
*************** file_acquire_sample_rows(Relation onerel
*** 1030,1036 ****
/*
* Create CopyState from FDW options.
*/
! cstate = BeginCopyFrom(onerel, filename, false, NIL, options);
/*
* Use per-tuple memory context to prevent leak of memory used to read
--- 1032,1038 ----
/*
* Create CopyState from FDW options.
*/
! cstate = BeginCopyFrom(onerel, filename, false, NULL, NIL, options);
/*
* Use per-tuple memory context to prevent leak of memory used to read
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 08abe14..4f59c63
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** typedef enum EolType
*** 96,102 ****
typedef struct CopyStateData
{
/* low-level state data */
! CopyDest copy_dest; /* type of copy source/destination */
FILE *copy_file; /* used if copy_dest == COPY_FILE */
StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
* dest == COPY_NEW_FE in COPY FROM */
--- 96,103 ----
typedef struct CopyStateData
{
/* low-level state data */
! CopyDest copy_src; /* type of copy source */
! CopyDest copy_dest; /* type of copy destination */
FILE *copy_file; /* used if copy_dest == COPY_FILE */
StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
* dest == COPY_NEW_FE in COPY FROM */
*************** typedef struct CopyStateData
*** 105,110 ****
--- 106,114 ----
int file_encoding; /* file or remote side's character encoding */
bool need_transcoding; /* file encoding diff from server? */
bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+ bool ignore_exceptions; /* should we trap and ignore exceptions? */
+ FILE *exc_file; /* file stream to write erroring lines to */
+ uint64 exceptions; /* total number of exceptions ignored */
/* parameters from the COPY command */
Relation rel; /* relation to copy to or from */
*************** typedef struct CopyStateData
*** 112,117 ****
--- 116,122 ----
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDIN/STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ char *exc_filename; /* filename for exceptions or NULL for STDOUT */
bool binary; /* binary format? */
bool oids; /* include OIDs? */
bool freeze; /* freeze rows on loading? */
*************** SendCopyBegin(CopyState cstate)
*** 347,352 ****
--- 352,366 ----
int16 format = (cstate->binary ? 1 : 0);
int i;
+ /*
+ * Check if we might need to stream exceptions to the frontend. If
+ * so, this must be a "COPY FROM file/program EXCEPTIONS TO STDOUT".
+ *
+ * We need to create the frontend message buffer now.
+ */
+ if (cstate->ignore_exceptions)
+ cstate->fe_msgbuf = makeStringInfo();
+
pq_beginmessage(&buf, 'H');
pq_sendbyte(&buf, format); /* overall format */
pq_sendint(&buf, natts, 2);
*************** ReceiveCopyBegin(CopyState cstate)
*** 388,404 ****
{
/* new way */
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
int16 format = (cstate->binary ? 1 : 0);
int i;
! pq_beginmessage(&buf, 'G');
pq_sendbyte(&buf, format); /* overall format */
pq_sendint(&buf, natts, 2);
for (i = 0; i < natts; i++)
pq_sendint(&buf, format, 2); /* per-column formats */
pq_endmessage(&buf);
! cstate->copy_dest = COPY_NEW_FE;
cstate->fe_msgbuf = makeStringInfo();
}
else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
--- 402,428 ----
{
/* new way */
StringInfoData buf;
+ char msgid = 'G'; /* receiving from client only */
int natts = list_length(cstate->attnumlist);
int16 format = (cstate->binary ? 1 : 0);
int i;
! /*
! * Check if we also need to pipe exceptions back to the frontend.
! */
! if (cstate->ignore_exceptions && cstate->exc_filename == NULL)
! {
! msgid = 'W'; /* copying in both directions */
! cstate->copy_dest = COPY_NEW_FE;
! }
!
! pq_beginmessage(&buf, msgid);
pq_sendbyte(&buf, format); /* overall format */
pq_sendint(&buf, natts, 2);
for (i = 0; i < natts; i++)
pq_sendint(&buf, format, 2); /* per-column formats */
pq_endmessage(&buf);
! cstate->copy_src = COPY_NEW_FE;
cstate->fe_msgbuf = makeStringInfo();
}
else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
*************** ReceiveCopyBegin(CopyState cstate)
*** 409,415 ****
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY BINARY is not supported to stdout or from stdin")));
pq_putemptymessage('G');
! cstate->copy_dest = COPY_OLD_FE;
}
else
{
--- 433,439 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY BINARY is not supported to stdout or from stdin")));
pq_putemptymessage('G');
! cstate->copy_src = COPY_OLD_FE;
}
else
{
*************** ReceiveCopyBegin(CopyState cstate)
*** 419,425 ****
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY BINARY is not supported to stdout or from stdin")));
pq_putemptymessage('D');
! cstate->copy_dest = COPY_OLD_FE;
}
/* We *must* flush here to ensure FE knows it can send. */
pq_flush();
--- 443,449 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY BINARY is not supported to stdout or from stdin")));
pq_putemptymessage('D');
! cstate->copy_src = COPY_OLD_FE;
}
/* We *must* flush here to ensure FE knows it can send. */
pq_flush();
*************** CopySendChar(CopyState cstate, char c)
*** 472,486 ****
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
static void
CopySendEndOfRow(CopyState cstate)
{
StringInfo fe_msgbuf = cstate->fe_msgbuf;
switch (cstate->copy_dest)
{
case COPY_FILE:
! if (!cstate->binary)
{
/* Default line termination depends on platform */
#ifndef WIN32
--- 496,560 ----
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
+ /*
+ * This should be called from PG_CATCH() after switching to appropriate
+ * MemoryContext.
+ */
+ static void
+ CopySendException(CopyState cstate)
+ {
+ ErrorData *error;
+
+ ++cstate->exceptions;
+
+ /*
+ * When reading from the frontend, we reuse the current line held in the
+ * message buffer to send the exception line back, otherwise we need to
+ * copy the line over from the line buffer.
+ */
+ if (cstate->copy_src == COPY_FILE)
+ CopySendData(cstate, cstate->line_buf.data, cstate->line_buf.len);
+
+ /* this flushes the message buffer */
+ CopySendEndOfRow(cstate);
+
+ error = CopyErrorData();
+ FlushErrorState();
+
+ /* report error as a harmless notice */
+ ereport(NOTICE,
+ (errmsg("%s", error->message)));
+ FreeErrorData(error);
+ }
+
static void
CopySendEndOfRow(CopyState cstate)
{
StringInfo fe_msgbuf = cstate->fe_msgbuf;
+ FILE *file;
+ bool should_add_newline;
+
+ /* determine where are we writing to */
+ if (cstate->ignore_exceptions)
+ {
+ file = cstate->exc_file;
+ /*
+ * We should only add a newline if we're not sending the frontend what
+ * it has just sent us and in any case we shouldn't do this for binary
+ * copy.
+ */
+ should_add_newline = (cstate->copy_src == COPY_FILE && !cstate->binary);
+ }
+ else
+ {
+ file = cstate->copy_file;
+ should_add_newline = !(cstate->binary);
+ }
switch (cstate->copy_dest)
{
case COPY_FILE:
! if (should_add_newline)
{
/* Default line termination depends on platform */
#ifndef WIN32
*************** CopySendEndOfRow(CopyState cstate)
*** 490,498 ****
#endif
}
! if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
! cstate->copy_file) != 1 ||
! ferror(cstate->copy_file))
{
if (cstate->is_program)
{
--- 564,571 ----
#endif
}
! if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, file) != 1 ||
! ferror(file))
{
if (cstate->is_program)
{
*************** CopySendEndOfRow(CopyState cstate)
*** 525,531 ****
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
! if (!cstate->binary)
CopySendChar(cstate, '\n');
if (pq_putbytes(fe_msgbuf->data, fe_msgbuf->len))
--- 598,604 ----
break;
case COPY_OLD_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
! if (should_add_newline)
CopySendChar(cstate, '\n');
if (pq_putbytes(fe_msgbuf->data, fe_msgbuf->len))
*************** CopySendEndOfRow(CopyState cstate)
*** 538,544 ****
break;
case COPY_NEW_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
! if (!cstate->binary)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
--- 611,617 ----
break;
case COPY_NEW_FE:
/* The FE/BE protocol uses \n as newline for all platforms */
! if (should_add_newline)
CopySendChar(cstate, '\n');
/* Dump the accumulated row as one CopyData message */
*************** CopySendEndOfRow(CopyState cstate)
*** 546,552 ****
break;
}
! resetStringInfo(fe_msgbuf);
}
/*
--- 619,630 ----
break;
}
! /*
! * Avoid resetting the buffer we reused to send the exception line back to
! * the frontend.
! */
! if (!cstate->ignore_exceptions || cstate->copy_src == COPY_FILE)
! resetStringInfo(fe_msgbuf);
}
/*
*************** CopyGetData(CopyState cstate, void *data
*** 567,573 ****
{
int bytesread = 0;
! switch (cstate->copy_dest)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
--- 645,651 ----
{
int bytesread = 0;
! switch (cstate->copy_src)
{
case COPY_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
*************** DoCopy(const CopyStmt *stmt, const char
*** 919,930 ****
PreventCommandIfReadOnly("COPY FROM");
cstate = BeginCopyFrom(rel, stmt->filename, stmt->is_program,
! stmt->attlist, stmt->options);
*processed = CopyFrom(cstate); /* copy from file to database */
EndCopyFrom(cstate);
}
else
{
cstate = BeginCopyTo(rel, query, queryString, relid,
stmt->filename, stmt->is_program,
stmt->attlist, stmt->options);
--- 997,1018 ----
PreventCommandIfReadOnly("COPY FROM");
cstate = BeginCopyFrom(rel, stmt->filename, stmt->is_program,
! stmt->exc_filename, stmt->attlist, stmt->options);
*processed = CopyFrom(cstate); /* copy from file to database */
+ if (cstate->exceptions)
+ ereport(NOTICE,
+ (errmsg("total exceptions ignored: " UINT64_FORMAT,
+ cstate->exceptions)));
EndCopyFrom(cstate);
}
else
{
+ if (stmt->exc_filename != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("EXCEPTIONS TO not allowed with COPY ... TO"),
+ errhint("see COPY ... FROM")));
+
cstate = BeginCopyTo(rel, query, queryString, relid,
stmt->filename, stmt->is_program,
stmt->attlist, stmt->options);
*************** BeginCopy(bool is_from,
*** 1561,1566 ****
--- 1649,1655 ----
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
+ cstate->copy_src = COPY_FILE; /* default */
cstate->copy_dest = COPY_FILE; /* default */
MemoryContextSwitchTo(oldcontext);
*************** EndCopy(CopyState cstate)
*** 1608,1613 ****
--- 1697,1708 ----
cstate->filename)));
}
+ if (cstate->exc_filename != NULL && FreeFile(cstate->exc_file))
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m",
+ cstate->exc_filename)));
+
MemoryContextDelete(cstate->copycontext);
pfree(cstate);
}
*************** CopyFrom(CopyState cstate)
*** 2331,2336 ****
--- 2426,2432 ----
{
TupleTableSlot *slot;
bool skip_tuple;
+ bool depleted;
Oid loaded_oid = InvalidOid;
CHECK_FOR_INTERRUPTS();
*************** CopyFrom(CopyState cstate)
*** 2348,2356 ****
/* Switch into its memory context */
MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
! if (!NextCopyFrom(cstate, econtext, values, nulls, &loaded_oid))
break;
/* And now we can form the input tuple. */
tuple = heap_form_tuple(tupDesc, values, nulls);
--- 2444,2475 ----
/* Switch into its memory context */
MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
! skip_tuple = false;
! depleted = false;
!
! PG_TRY();
! {
! if (!NextCopyFrom(cstate, econtext, values, nulls, &loaded_oid))
! /* can't break right here due to PG_TRY using do/while(0) */
! depleted = true;
! }
! PG_CATCH();
! {
! if (!cstate->ignore_exceptions)
! PG_RE_THROW();
!
! skip_tuple = true;
! MemoryContextSwitchTo(oldcontext);
! CopySendException(cstate);
! }
! PG_END_TRY();
!
! if (depleted)
break;
+ if (skip_tuple)
+ continue;
+
/* And now we can form the input tuple. */
tuple = heap_form_tuple(tupDesc, values, nulls);
*************** CopyFrom(CopyState cstate)
*** 2376,2382 ****
if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_before_row)
{
! slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);
if (slot == NULL) /* "do nothing" */
skip_tuple = true;
--- 2495,2514 ----
if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_before_row)
{
! PG_TRY();
! {
! slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);
! }
! PG_CATCH();
! {
! if (!cstate->ignore_exceptions)
! PG_RE_THROW();
!
! slot = NULL;
! MemoryContextSwitchTo(oldcontext);
! CopySendException(cstate);
! }
! PG_END_TRY();
if (slot == NULL) /* "do nothing" */
skip_tuple = true;
*************** CopyFrom(CopyState cstate)
*** 2384,2395 ****
tuple = ExecMaterializeSlot(slot);
}
! if (!skip_tuple)
{
! /* Check the constraints of the tuple */
! if (cstate->rel->rd_att->constr)
ExecConstraints(resultRelInfo, slot, estate);
if (useHeapMultiInsert)
{
/* Add this tuple to the tuple buffer */
--- 2516,2545 ----
tuple = ExecMaterializeSlot(slot);
}
! if (skip_tuple)
! continue;
!
! /* Check the constraints of the tuple */
! if (cstate->rel->rd_att->constr)
{
! PG_TRY();
! {
ExecConstraints(resultRelInfo, slot, estate);
+ }
+ PG_CATCH();
+ {
+ if (!cstate->ignore_exceptions)
+ PG_RE_THROW();
+ skip_tuple = true;
+ MemoryContextSwitchTo(oldcontext);
+ CopySendException(cstate);
+ }
+ PG_END_TRY();
+ }
+
+ if (!skip_tuple)
+ {
if (useHeapMultiInsert)
{
/* Add this tuple to the tuple buffer */
*************** CopyState
*** 2573,2583 ****
--- 2723,2736 ----
BeginCopyFrom(Relation rel,
const char *filename,
bool is_program,
+ const char *exc_filename,
List *attnamelist,
List *options)
{
CopyState cstate;
bool pipe = (filename == NULL);
+ bool ignore_exceptions = (exc_filename != NULL);
+ bool exc_pipe = (exc_filename != NULL && *exc_filename == 0);
TupleDesc tupDesc;
Form_pg_attribute *attr;
AttrNumber num_phys_attrs,
*************** BeginCopyFrom(Relation rel,
*** 2686,2691 ****
--- 2839,2900 ----
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
cstate->is_program = is_program;
+ cstate->ignore_exceptions = ignore_exceptions;
+
+ if (ignore_exceptions)
+ {
+ if (exc_pipe)
+ {
+ if (whereToSendOutput == DestRemote)
+ {
+ if (!pipe)
+ SendCopyBegin(cstate);
+ else
+ ; /* handled by ReceiveCopyBegin() call below */
+ }
+ else
+ {
+ cstate->exc_file = stdout;
+ }
+ }
+ else
+ {
+ if (!is_absolute_path(exc_filename))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("relative path not allowed for EXCEPTIONS TO file")));
+
+ if (!pipe)
+ {
+ struct stat stbuf_exc;
+ struct stat stbuf_from;
+
+ /* check if both FROM and EXCEPTIONS TO are the same file */
+ if (stat(exc_filename, &stbuf_exc) == 0 &&
+ stat(filename, &stbuf_from) == 0 &&
+ stbuf_exc.st_dev == stbuf_from.st_dev &&
+ stbuf_exc.st_ino == stbuf_from.st_ino)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_NAME),
+ errmsg("cannot specify the same file for COPY FROM and EXCEPTIONS TO")));
+ }
+ /*
+ * We won't send or receive data via frontend, but we still
+ * need to the buffer for CopySendException() to work with.
+ */
+ cstate->fe_msgbuf = makeStringInfo();
+ }
+
+ cstate->exc_filename = pstrdup(exc_filename);
+ cstate->exc_file = AllocateFile(cstate->exc_filename, PG_BINARY_W);
+ if (cstate->exc_file == NULL)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\" for writing: %m",
+ cstate->exc_filename)));
+ }
+ }
if (pipe)
{
*************** NextCopyFrom(CopyState cstate, ExprConte
*** 3019,3025 ****
*/
char dummy;
! if (cstate->copy_dest != COPY_OLD_FE &&
CopyGetData(cstate, &dummy, 1, 1) > 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
--- 3228,3234 ----
*/
char dummy;
! if (cstate->copy_src != COPY_OLD_FE &&
CopyGetData(cstate, &dummy, 1, 1) > 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
*************** CopyReadLine(CopyState cstate)
*** 3133,3139 ****
* after \. up to the protocol end of copy data. (XXX maybe better
* not to treat \. as special?)
*/
! if (cstate->copy_dest == COPY_NEW_FE)
{
do
{
--- 3342,3348 ----
* after \. up to the protocol end of copy data. (XXX maybe better
* not to treat \. as special?)
*/
! if (cstate->copy_src == COPY_NEW_FE)
{
do
{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
new file mode 100644
index 6431601..47f2be2
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static Node *makeRecursiveViewSelect(cha
*** 307,313 ****
%type <defelt> event_trigger_when_item
%type <chr> enable_trigger
! %type <str> copy_file_name
database_name access_method_clause access_method attr_name
name cursor_name file_name
index_name opt_index_name cluster_index_specification
--- 307,313 ----
%type <defelt> event_trigger_when_item
%type <chr> enable_trigger
! %type <str> copy_file_name opt_copy_exceptions
database_name access_method_clause access_method attr_name
name cursor_name file_name
index_name opt_index_name cluster_index_specification
*************** static Node *makeRecursiveViewSelect(cha
*** 561,567 ****
DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT EXCEPT
EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
EXTENSION EXTERNAL EXTRACT
--- 561,568 ----
DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EVENT
! EXCEPT EXCEPTIONS
EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN
EXTENSION EXTERNAL EXTRACT
*************** ClosePortalStmt:
*** 2515,2521 ****
/*****************************************************************************
*
* QUERY :
! * COPY relname [(columnList)] FROM/TO file [WITH] [(options)]
* COPY ( SELECT ... ) TO file [WITH] [(options)]
*
* where 'file' can be one of:
--- 2516,2523 ----
/*****************************************************************************
*
* QUERY :
! * COPY relname [(columnList)] FROM/TO file [EXCEPTIONS TO file]
! * [WITH] [(options)]
* COPY ( SELECT ... ) TO file [WITH] [(options)]
*
* where 'file' can be one of:
*************** ClosePortalStmt:
*** 2534,2540 ****
*****************************************************************************/
CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids
! copy_from opt_program copy_file_name copy_delimiter opt_with copy_options
{
CopyStmt *n = makeNode(CopyStmt);
n->relation = $3;
--- 2536,2543 ----
*****************************************************************************/
CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids
! copy_from opt_program copy_file_name opt_copy_exceptions copy_delimiter
! opt_with copy_options
{
CopyStmt *n = makeNode(CopyStmt);
n->relation = $3;
*************** CopyStmt: COPY opt_binary qualified_name
*** 2543,2548 ****
--- 2546,2552 ----
n->is_from = $6;
n->is_program = $7;
n->filename = $8;
+ n->exc_filename = $9;
if (n->is_program && n->filename == NULL)
ereport(ERROR,
*************** CopyStmt: COPY opt_binary qualified_name
*** 2556,2565 ****
n->options = lappend(n->options, $2);
if ($5)
n->options = lappend(n->options, $5);
! if ($9)
! n->options = lappend(n->options, $9);
! if ($11)
! n->options = list_concat(n->options, $11);
$$ = (Node *)n;
}
| COPY select_with_parens TO opt_program copy_file_name opt_with copy_options
--- 2560,2569 ----
n->options = lappend(n->options, $2);
if ($5)
n->options = lappend(n->options, $5);
! if ($10)
! n->options = lappend(n->options, $10);
! if ($12)
! n->options = list_concat(n->options, $12);
$$ = (Node *)n;
}
| COPY select_with_parens TO opt_program copy_file_name opt_with copy_options
*************** copy_file_name:
*** 2604,2609 ****
--- 2608,2618 ----
| STDOUT { $$ = NULL; }
;
+ opt_copy_exceptions:
+ EXCEPTIONS TO copy_file_name { $$ = ($3 ? $3 : ""); }
+ | /* EMPTY */ { $$ = NULL; }
+ ;
+
copy_options: copy_opt_list { $$ = $1; }
| '(' copy_generic_opt_list ')' { $$ = $2; }
;
*************** unreserved_keyword:
*** 13142,13147 ****
--- 13151,13157 ----
| ENUM_P
| ESCAPE
| EVENT
+ | EXCEPTIONS
| EXCLUDE
| EXCLUDING
| EXCLUSIVE
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
new file mode 100644
index 66d80b5..d308286
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
*************** AcceptResult(const PGresult *result)
*** 388,393 ****
--- 388,394 ----
case PGRES_EMPTY_QUERY:
case PGRES_COPY_IN:
case PGRES_COPY_OUT:
+ case PGRES_COPY_BOTH:
/* Fine, do nothing */
OK = true;
break;
*************** ProcessResult(PGresult **results)
*** 751,756 ****
--- 752,758 ----
case PGRES_COPY_OUT:
case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
is_copy = true;
break;
*************** ProcessResult(PGresult **results)
*** 777,784 ****
SetCancelConn();
if (result_status == PGRES_COPY_OUT)
{
! if (!copystream)
copystream = pset.queryFout;
success = handleCopyOut(pset.db,
copystream,
©_result) && success;
--- 779,793 ----
SetCancelConn();
if (result_status == PGRES_COPY_OUT)
{
! /*
! * If we have the stream for exceptions, then this must be the
! * capture phase: use it.
! */
! if (pset.copyExcStream)
! copystream = pset.copyExcStream;
! else if (!copystream)
copystream = pset.queryFout;
+
success = handleCopyOut(pset.db,
copystream,
©_result) && success;
*************** ProcessResult(PGresult **results)
*** 794,800 ****
copy_result = NULL;
}
}
! else
{
if (!copystream)
copystream = pset.cur_cmd_source;
--- 803,809 ----
copy_result = NULL;
}
}
! else /* PGRES_COPY_IN or PGRES_COPY_BOTH */
{
if (!copystream)
copystream = pset.cur_cmd_source;
*************** PrintQueryResults(PGresult *results)
*** 913,918 ****
--- 922,928 ----
case PGRES_COPY_OUT:
case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
/* nothing to do here */
success = true;
break;
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
new file mode 100644
index 010a593..cc36071
*** a/src/bin/psql/copy.c
--- b/src/bin/psql/copy.c
*************** struct copy_options
*** 58,63 ****
--- 58,66 ----
bool program; /* is 'file' a program to popen? */
bool psql_inout; /* true = use psql stdin/stdout */
bool from; /* true = FROM, false = TO */
+ bool exceptions; /* has EXCEPTIONS TO */
+ char *exc_file; /* NULL = stdin/stdout */
+ bool exc_psql_inout; /* true = use psql stdout for exceptions */
};
*************** free_copy_options(struct copy_options *
*** 69,74 ****
--- 72,78 ----
free(ptr->before_tofrom);
free(ptr->after_tofrom);
free(ptr->file);
+ free(ptr->exc_file);
free(ptr);
}
*************** parse_slash_copy(const char *args)
*** 240,250 ****
expand_tilde(&result->file);
}
/* Collect the rest of the line (COPY options) */
token = strtokx(NULL, "", NULL, NULL,
0, false, false, pset.encoding);
if (token)
! result->after_tofrom = pg_strdup(token);
return result;
--- 244,302 ----
expand_tilde(&result->file);
}
+ result->after_tofrom = pg_strdup(""); /* initialize for appending */
+
+ /* check for COPY FROM ... EXCEPTIONS TO */
+ if (result->from)
+ {
+ token = strtokx(NULL, whitespace, NULL, NULL,
+ 0, false, false, pset.encoding);
+ if (token)
+ {
+ if (pg_strcasecmp(token, "exceptions") == 0)
+ {
+ result->exceptions = true;
+
+ token = strtokx(NULL, whitespace, NULL, NULL,
+ 0, false, false, pset.encoding);
+ if (!token || pg_strcasecmp(token, "to") != 0)
+ goto error;
+
+ token = strtokx(NULL, whitespace, ";", "'",
+ 0, false, false, pset.encoding);
+ if (!token)
+ goto error;
+
+ if (pg_strcasecmp(token, "stdin") == 0 ||
+ pg_strcasecmp(token, "stdout") == 0)
+ {
+ result->exc_file = NULL;
+ }
+ else if (pg_strcasecmp(token, "pstdin") == 0 ||
+ pg_strcasecmp(token, "pstdout") == 0)
+ {
+ result->exc_psql_inout = true;
+ result->exc_file = NULL;
+ }
+ else
+ {
+ strip_quotes(token, '\'', 0, pset.encoding);
+ result->exc_file = pg_strdup(token);
+ expand_tilde(&result->exc_file);
+ }
+ }
+ else
+ {
+ xstrcat(&result->after_tofrom, token);
+ }
+ }
+ }
+
/* Collect the rest of the line (COPY options) */
token = strtokx(NULL, "", NULL, NULL,
0, false, false, pset.encoding);
if (token)
! xstrcat(&result->after_tofrom, token);
return result;
*************** do_copy(const char *args)
*** 269,274 ****
--- 321,327 ----
{
PQExpBufferData query;
FILE *copystream;
+ FILE *excstream;
struct copy_options *options;
bool success;
*************** do_copy(const char *args)
*** 278,287 ****
if (!options)
return false;
! /* prepare to read or write the target file */
if (options->file && !options->program)
canonicalize_path(options->file);
if (options->from)
{
if (options->file)
--- 331,346 ----
if (!options)
return false;
! /* prepare to read or write the target file(s) */
if (options->file && !options->program)
canonicalize_path(options->file);
+ if (options->exc_file)
+ canonicalize_path(options->exc_file);
+
+ copystream = NULL;
+ excstream = NULL;
+
if (options->from)
{
if (options->file)
*************** do_copy(const char *args)
*** 294,305 ****
--- 353,394 ----
copystream = popen(options->file, PG_BINARY_R);
}
else
+ {
+ if (options->exc_file)
+ {
+ struct stat stbuf_exc;
+ struct stat stbuf_from;
+
+ /* check if both FROM and EXCEPTIONS TO are the same file */
+ if (stat(options->exc_file, &stbuf_exc) == 0 &&
+ stat(options->file, &stbuf_from) == 0 &&
+ stbuf_exc.st_dev == stbuf_from.st_dev &&
+ stbuf_exc.st_ino == stbuf_from.st_ino)
+ {
+ psql_error("COPY FROM and EXCEPTIONS TO cannot point to the same file\n");
+ free_copy_options(options);
+ return false;
+ }
+ }
copystream = fopen(options->file, PG_BINARY_R);
+ }
}
else if (!options->psql_inout)
copystream = pset.cur_cmd_source;
else
copystream = stdin;
+
+ if (options->exceptions)
+ {
+ if (options->exc_file)
+ {
+ excstream = fopen(options->exc_file, PG_BINARY_W);
+ }
+ else if (!options->exc_psql_inout)
+ excstream = pset.queryFout;
+ else
+ excstream = stdout;
+ }
}
else
{
*************** do_copy(const char *args)
*** 332,337 ****
--- 421,438 ----
else
psql_error("%s: %s\n",
options->file, strerror(errno));
+ if (options->exc_file && excstream)
+ fclose(excstream);
+ free_copy_options(options);
+ return false;
+ }
+
+ if (options->exceptions && !excstream)
+ {
+ psql_error("%s: %s\n",
+ options->exc_file, strerror(errno));
+ if (options->file)
+ fclose(copystream);
free_copy_options(options);
return false;
}
*************** do_copy(const char *args)
*** 353,358 ****
--- 454,461 ----
if (result < 0 || S_ISDIR(st.st_mode))
{
fclose(copystream);
+ if (options->exc_file && excstream)
+ fclose(excstream);
free_copy_options(options);
return false;
}
*************** do_copy(const char *args)
*** 366,378 ****
--- 469,485 ----
appendPQExpBufferStr(&query, " FROM STDIN ");
else
appendPQExpBufferStr(&query, " TO STDOUT ");
+ if (options->exceptions)
+ appendPQExpBufferStr(&query, " EXCEPTIONS TO STDOUT ");
if (options->after_tofrom)
appendPQExpBufferStr(&query, options->after_tofrom);
/* run it like a user command, but with copystream as data source/sink */
pset.copyStream = copystream;
+ pset.copyExcStream = excstream;
success = SendQuery(query.data);
pset.copyStream = NULL;
+ pset.copyExcStream = NULL;
termPQExpBuffer(&query);
if (options->file != NULL)
*************** do_copy(const char *args)
*** 410,415 ****
--- 517,530 ----
}
}
}
+ if (options->exc_file != NULL)
+ {
+ if (fclose(excstream) != 0)
+ {
+ psql_error("%s: %s\n", options->exc_file, strerror(errno));
+ success = false;
+ }
+ }
free_copy_options(options);
return success;
}
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
new file mode 100644
index ef24a4e..36c9300
*** a/src/bin/psql/settings.h
--- b/src/bin/psql/settings.h
*************** typedef struct _psqlSettings
*** 72,77 ****
--- 72,78 ----
bool queryFoutPipe; /* queryFout is from a popen() */
FILE *copyStream; /* Stream to read/write for \copy command */
+ FILE *copyExcStream; /* Stream to read exceptions for \copy command */
printQueryOpt popt;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
new file mode 100644
index 11a159a..f1f65df
*** a/src/bin/psql/startup.c
--- b/src/bin/psql/startup.c
*************** main(int argc, char *argv[])
*** 122,127 ****
--- 122,128 ----
pset.queryFout = stdout;
pset.queryFoutPipe = false;
pset.copyStream = NULL;
+ pset.copyExcStream = NULL;
pset.cur_cmd_source = stdin;
pset.cur_cmd_interactive = false;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 82c926d..361cd80
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*************** psql_completion(const char *text, int st
*** 2169,2175 ****
completion_charp = "";
matches = completion_matches(text, complete_from_files);
}
-
/* Handle COPY|BINARY <sth> FROM|TO filename */
else if ((pg_strcasecmp(prev4_wd, "COPY") == 0 ||
pg_strcasecmp(prev4_wd, "\\copy") == 0 ||
--- 2169,2174 ----
*************** psql_completion(const char *text, int st
*** 2178,2187 ****
pg_strcasecmp(prev2_wd, "TO") == 0))
{
static const char *const list_COPY[] =
! {"BINARY", "OIDS", "DELIMITER", "NULL", "CSV", "ENCODING", NULL};
!
COMPLETE_WITH_LIST(list_COPY);
}
/* Handle COPY|BINARY <sth> FROM|TO filename CSV */
else if (pg_strcasecmp(prev_wd, "CSV") == 0 &&
--- 2177,2193 ----
pg_strcasecmp(prev2_wd, "TO") == 0))
{
static const char *const list_COPY[] =
! {"BINARY", "OIDS", "DELIMITER", "NULL", "CSV", "ENCODING", "EXCEPTIONS TO", NULL};
COMPLETE_WITH_LIST(list_COPY);
}
+ /* If we have [COPY...] FROM <sth> EXCEPTIONS TO, complete with filename */
+ else if (pg_strcasecmp(prev4_wd, "FROM") == 0 &&
+ pg_strcasecmp(prev2_wd, "EXCEPTIONS") == 0 &&
+ pg_strcasecmp(prev_wd, "TO") == 0)
+ {
+ completion_charp = "";
+ matches = completion_matches(text, complete_from_files);
+ }
/* Handle COPY|BINARY <sth> FROM|TO filename CSV */
else if (pg_strcasecmp(prev_wd, "CSV") == 0 &&
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
new file mode 100644
index ba0f1b3..7137dfc
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** extern Oid DoCopy(const CopyStmt *stmt,
*** 26,32 ****
extern void ProcessCopyOptions(CopyState cstate, bool is_from, List *options);
extern CopyState BeginCopyFrom(Relation rel, const char *filename,
! bool is_program, List *attnamelist, List *options);
extern void EndCopyFrom(CopyState cstate);
extern bool NextCopyFrom(CopyState cstate, ExprContext *econtext,
Datum *values, bool *nulls, Oid *tupleOid);
--- 26,33 ----
extern void ProcessCopyOptions(CopyState cstate, bool is_from, List *options);
extern CopyState BeginCopyFrom(Relation rel, const char *filename,
! bool is_program, const char *exc_filename,
! List *attnamelist, List *options);
extern void EndCopyFrom(CopyState cstate);
extern bool NextCopyFrom(CopyState cstate, ExprContext *econtext,
Datum *values, bool *nulls, Oid *tupleOid);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
new file mode 100644
index 9141c30..93f50cb
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
*************** typedef struct CopyStmt
*** 1513,1518 ****
--- 1513,1519 ----
bool is_from; /* TO or FROM */
bool is_program; /* is 'filename' a program to popen? */
char *filename; /* filename, or NULL for STDIN/STDOUT */
+ char *exc_filename; /* filename for exceptions or NULL, empty string for STDOUT */
List *options; /* List of DefElem nodes */
} CopyStmt;
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
new file mode 100644
index e14dc9a..c44f02b
*** a/src/include/parser/kwlist.h
--- b/src/include/parser/kwlist.h
*************** PG_KEYWORD("enum", ENUM_P, UNRESERVED_KE
*** 143,148 ****
--- 143,149 ----
PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD)
PG_KEYWORD("event", EVENT, UNRESERVED_KEYWORD)
PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD)
+ PG_KEYWORD("exceptions", EXCEPTIONS, UNRESERVED_KEYWORD)
PG_KEYWORD("exclude", EXCLUDE, UNRESERVED_KEYWORD)
PG_KEYWORD("excluding", EXCLUDING, UNRESERVED_KEYWORD)
PG_KEYWORD("exclusive", EXCLUSIVE, UNRESERVED_KEYWORD)
diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons
new file mode 100644
index b3b36cf..0a415e6
*** a/src/interfaces/ecpg/preproc/ecpg.addons
--- b/src/interfaces/ecpg/preproc/ecpg.addons
*************** ECPG: where_or_current_clauseWHERECURREN
*** 192,198 ****
char *cursor_marker = $4[0] == ':' ? mm_strdup("$0") : $4;
$$ = cat_str(2,mm_strdup("where current of"), cursor_marker);
}
! ECPG: CopyStmtCOPYopt_binaryqualified_nameopt_column_listopt_oidscopy_fromopt_programcopy_file_namecopy_delimiteropt_withcopy_options addon
if (strcmp($6, "from") == 0 &&
(strcmp($7, "stdin") == 0 || strcmp($7, "stdout") == 0))
mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not implemented");
--- 192,198 ----
char *cursor_marker = $4[0] == ':' ? mm_strdup("$0") : $4;
$$ = cat_str(2,mm_strdup("where current of"), cursor_marker);
}
! ECPG: CopyStmtCOPYopt_binaryqualified_nameopt_column_listopt_oidscopy_fromopt_programcopy_file_nameopt_copy_exceptionscopy_delimiteropt_withcopy_options addon
if (strcmp($6, "from") == 0 &&
(strcmp($7, "stdin") == 0 || strcmp($7, "stdout") == 0))
mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not implemented");
--
2.1.0
2014-12-25 22:23 GMT+01:00 Alex Shulgin <ash@commandprompt.com>:
Trent Shipley <trent_shipley@qwest.net> writes:
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
to drop (and log) rows that contain malformed data. That is, rows with
too many or too few columns, rows that result in constraintviolations,
and rows containing columns where the data type's input function
raises
an error. The last case is the only thing that would be a bit tricky
to
implement, I think: you could use PG_TRY() around the
InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This
would help with the 4G subtransaction limit. The cost to the ETLprocess is
that a simple rollback would not be guaranteed send the process back to
it's
initial state. There are easy ways to deal with the rollback issue
though.
A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY
option is selected then the COPY FROM can run without subtransactionsand in
excess of the 4G per transaction limit. NO RETRY should be the default
since
it preserves the legacy behavior of COPY FROM.
You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
give the
option of sending exceptions to a table since they are presumably
malformed,
otherwise they would not be exceptions. (Users should re-process
exception
files if they want an if good then table a else exception to table b ...)
EXCEPTIONS TO and NO RETRY would be mutually exclusive.
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.Hello,
Attached is a proof of concept patch for this TODO item. There is no
docs yet, I just wanted to know if approach is sane.The added syntax is like the following:
COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work. The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.Example in action:
postgres=# \d test_copy2
Table "public.test_copy2"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | integer |postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 1: "1"
2
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 2: "2"
3
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 3: "3"
NOTICE: total exceptions ignored: 3postgres=# \d test_copy1
Table "public.test_copy1"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not nullpostgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=#Limited performance testing shows no significant difference between
error-catching and plain code path. For example, timingcopy test_copy1 from program 'seq 1000000' [exceptions to stdout]
shows similar numbers with or without the added "exceptions to" clause.
Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds. Any
example of what would be not properly rolled back by just PG_TRY?
this method is unsafe .. exception handlers doesn't free memory usually -
there is risk of memory leaks, source leaks
you can enforce same performance with block subtransactions - when you use
subtransaction for 1000 rows, then impact of subtransactions is minimal
when block fails, then you can use row level subtransaction - it works well
when you expect almost correct data.
Regards
Pavel
Show quoted text
Happy hacking!
--
Alex--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-12-26 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-12-25 22:23 GMT+01:00 Alex Shulgin <ash@commandprompt.com>:
Trent Shipley <trent_shipley@qwest.net> writes:
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct
COPY
to drop (and log) rows that contain malformed data. That is, rows
with
too many or too few columns, rows that result in constraint
violations,
and rows containing columns where the data type's input function
raises
an error. The last case is the only thing that would be a bit tricky
to
implement, I think: you could use PG_TRY() around the
InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset your
state correctly after catching an error.Yeah. It's the subtransaction per row that's daunting --- not only the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.You could extend the COPY FROM syntax with a COMMIT EVERY n clause.
This
would help with the 4G subtransaction limit. The cost to the ETL
process is
that a simple rollback would not be guaranteed send the process back to
it's
initial state. There are easy ways to deal with the rollback issue
though.
A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY
option is selected then the COPY FROM can run without subtransactionsand in
excess of the 4G per transaction limit. NO RETRY should be the default
since
it preserves the legacy behavior of COPY FROM.
You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
give the
option of sending exceptions to a table since they are presumably
malformed,
otherwise they would not be exceptions. (Users should re-process
exception
files if they want an if good then table a else exception to table b
...)
EXCEPTIONS TO and NO RETRY would be mutually exclusive.
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.Hello,
Attached is a proof of concept patch for this TODO item. There is no
docs yet, I just wanted to know if approach is sane.The added syntax is like the following:
COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work. The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.Example in action:
postgres=# \d test_copy2
Table "public.test_copy2"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | integer |postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 1: "1"
2
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 2: "2"
3
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 3: "3"
NOTICE: total exceptions ignored: 3postgres=# \d test_copy1
Table "public.test_copy1"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not nullpostgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=#Limited performance testing shows no significant difference between
error-catching and plain code path. For example, timingcopy test_copy1 from program 'seq 1000000' [exceptions to stdout]
shows similar numbers with or without the added "exceptions to" clause.
Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds. Any
example of what would be not properly rolled back by just PG_TRY?this method is unsafe .. exception handlers doesn't free memory usually -
there is risk of memory leaks, source leaksyou can enforce same performance with block subtransactions - when you use
subtransaction for 1000 rows, then impact of subtransactions is minimalwhen block fails, then you can use row level subtransaction - it works
well when you expect almost correct data.
Two years ago I wrote a extension that did it - but I have not time to
finish it and push to upstream.
Regards
Pavel
Show quoted text
Regards
Pavel
Happy hacking!
--
Alex--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
Hello.
Wrote a patch implementing COPY with ignoring errors in rows using block
subtransactions.
Syntax: COPY [table] FROM [file/stdin] WITH IGNORE_ERROS;
Examples:
CREATE TABLE check_ign_err (n int, m int, k int);
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 2 2 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)
##################################################
TRUNCATE check_ign_err;
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2: "2 2"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)
##################################################
TRUNCATE check_ign_err;
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 a 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2, column m: "a"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)
Regards, Damir
пт, 10 дек. 2021 г. в 21:48, Pavel Stehule <pavel.stehule@gmail.com>:
Show quoted text
2014-12-26 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-12-25 22:23 GMT+01:00 Alex Shulgin <ash@commandprompt.com>:
Trent Shipley <trent_shipley@qwest.net> writes:
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct
COPY
to drop (and log) rows that contain malformed data. That is, rows
with
too many or too few columns, rows that result in constraint
violations,
and rows containing columns where the data type's input function
raises
an error. The last case is the only thing that would be a bit
tricky to
implement, I think: you could use PG_TRY() around the
InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset
your
state correctly after catching an error.
Yeah. It's the subtransaction per row that's daunting --- not only
the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.You could extend the COPY FROM syntax with a COMMIT EVERY n clause.
This
would help with the 4G subtransaction limit. The cost to the ETL
process is
that a simple rollback would not be guaranteed send the process back
to it's
initial state. There are easy ways to deal with the rollback issue
though.
A {NO} RETRY {USING algorithm} clause might be useful. If the NO
RETRY
option is selected then the COPY FROM can run without subtransactions
and in
excess of the 4G per transaction limit. NO RETRY should be the
default since
it preserves the legacy behavior of COPY FROM.
You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
give the
option of sending exceptions to a table since they are presumably
malformed,
otherwise they would not be exceptions. (Users should re-process
exception
files if they want an if good then table a else exception to table b
...)
EXCEPTIONS TO and NO RETRY would be mutually exclusive.
If we could somehow only do a subtransaction per failure, things would
be much better, but I don't see how.Hello,
Attached is a proof of concept patch for this TODO item. There is no
docs yet, I just wanted to know if approach is sane.The added syntax is like the following:
COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work. The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.Example in action:
postgres=# \d test_copy2
Table "public.test_copy2"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | integer |postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 1: "1"
2
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 2: "2"
3
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 3: "3"
NOTICE: total exceptions ignored: 3postgres=# \d test_copy1
Table "public.test_copy1"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not nullpostgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=#Limited performance testing shows no significant difference between
error-catching and plain code path. For example, timingcopy test_copy1 from program 'seq 1000000' [exceptions to stdout]
shows similar numbers with or without the added "exceptions to" clause.
Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds. Any
example of what would be not properly rolled back by just PG_TRY?this method is unsafe .. exception handlers doesn't free memory usually -
there is risk of memory leaks, source leaksyou can enforce same performance with block subtransactions - when you
use subtransaction for 1000 rows, then impact of subtransactions is minimalwhen block fails, then you can use row level subtransaction - it works
well when you expect almost correct data.Two years ago I wrote a extension that did it - but I have not time to
finish it and push to upstream.Regards
Pavel
Regards
Pavel
Happy hacking!
--
Alex--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
0001-COPY-IGNORE_ERRORS-option.patchtext/x-patch; charset=US-ASCII; name=0001-COPY-IGNORE_ERRORS-option.patchDownload
From b2dbe11f103655bf845aabc3d7af3697d1441b43 Mon Sep 17 00:00:00 2001
From: Damir Belyalov <dam.bel07@gmail.com>
Date: Fri, 15 Oct 2021 11:55:18 +0300
Subject: [PATCH] COPY IGNORE_ERRORS option
---
doc/src/sgml/ref/copy.sgml | 13 +
src/backend/commands/copy.c | 8 +
src/backend/commands/copyfrom.c | 72 ++++-
src/backend/commands/copyfromparse.c | 13 +-
src/backend/parser/gram.y | 8 +-
src/backend/utils/.gitignore | 1 -
src/backend/utils/errcodes.h | 354 +++++++++++++++++++++++
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 32 ++
src/test/regress/sql/copy2.sql | 26 ++
13 files changed, 525 insertions(+), 8 deletions(-)
create mode 100644 src/backend/utils/errcodes.h
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4624c8f4c9..5ca8ff876d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
@@ -233,6 +234,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drop rows that contain malformed data while copying. That is rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows that result in constraint violations, rows containing columns where
+ the data type's input function raises an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 53f4853141..9ddc8c3d96 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -337,6 +337,7 @@ ProcessCopyOptions(ParseState *pstate,
{
bool format_specified = false;
bool freeze_specified = false;
+ bool ignore_errors_specified = false;
bool header_specified = false;
ListCell *option;
@@ -377,6 +378,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f366a818a1..d416222faf 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -852,9 +852,75 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ /*
+ * If option IGNORE_ERRORS is enabled, COPY skip rows with errors.
+ * NextCopyFrom() directly store the values/nulls array in the slot.
+ */
+ if (cstate->opts.ignore_errors)
+ {
+ bool break_for = false;
+ bool skip_tuple_ignore_errors = false;
+ MemoryContext ccxt = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
+
+ BeginInternalSubTransaction(NULL);
+ MemoryContextSwitchTo(ccxt);
+
+ PG_TRY();
+ {
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ // can't do break in PG_TRY
+ break_for = true;
+ }
+
+ ReleaseCurrentSubTransaction();
+ MemoryContextSwitchTo(ccxt);
+ CurrentResourceOwner = oldowner;
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(ccxt);
+ ErrorData *errdata = CopyErrorData();
+
+ switch (errdata->sqlerrcode)
+ {
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ skip_tuple_ignore_errors = true;
+ elog(WARNING, errdata->context);
+
+ RollbackAndReleaseCurrentSubTransaction();
+ MemoryContextSwitchTo(ccxt);
+ CurrentResourceOwner = oldowner;
+
+ ExecClearTuple(myslot);
+ MemSet(myslot->tts_values, 0, cstate->attr_count * sizeof(Datum));
+ MemSet(myslot->tts_isnull, true, cstate->attr_count * sizeof(bool));
+
+ break;
+ default:
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+ }
+ PG_END_TRY();
+
+ if (break_for)
+ break;
+
+ if (skip_tuple_ignore_errors)
+ continue;
+ }
+ else
+ {
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index aac10165ec..702b70861d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -817,6 +817,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
attr_count = list_length(cstate->attnumlist);
+ cstate->attr_count = attr_count;
/* Initialize all values for row to NULL */
MemSet(values, 0, num_phys_attrs * sizeof(Datum));
@@ -893,6 +894,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
string,
typioparams[m],
att->atttypmod);
+
if (string != NULL)
nulls[m] = false;
cstate->cur_attname = NULL;
@@ -1446,10 +1448,17 @@ CopyReadAttributesText(CopyFromState cstate)
if (cstate->max_fields <= 0)
{
if (cstate->line_buf.len != 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ {
+ if (cstate->opts.ignore_errors)
+ ereport(ERROR,
+ (errcode(ERRCODE_FOR_IGNORE_ERRORS),
errmsg("extra data after last expected column")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
return 0;
+ }
}
resetStringInfo(&cstate->attribute_buf);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 86ce33bd97..5f6863e5e1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -674,7 +674,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3160,6 +3160,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *)makeInteger(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeInteger(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *)makeString($3), @1);
@@ -15659,6 +15663,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -16209,6 +16214,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/backend/utils/.gitignore b/src/backend/utils/.gitignore
index 0685556959..3c679d07ff 100644
--- a/src/backend/utils/.gitignore
+++ b/src/backend/utils/.gitignore
@@ -3,4 +3,3 @@
/fmgrprotos.h
/fmgr-stamp
/probes.h
-/errcodes.h
diff --git a/src/backend/utils/errcodes.h b/src/backend/utils/errcodes.h
new file mode 100644
index 0000000000..b187599bdd
--- /dev/null
+++ b/src/backend/utils/errcodes.h
@@ -0,0 +1,354 @@
+/* autogenerated from src/backend/utils/errcodes.txt, do not edit */
+/* there is deliberately not an #ifndef ERRCODES_H here */
+
+/* Class 00 - Successful Completion */
+#define ERRCODE_SUCCESSFUL_COMPLETION MAKE_SQLSTATE('0','0','0','0','0')
+
+/* Class 01 - Warning */
+#define ERRCODE_WARNING MAKE_SQLSTATE('0','1','0','0','0')
+#define ERRCODE_WARNING_DYNAMIC_RESULT_SETS_RETURNED MAKE_SQLSTATE('0','1','0','0','C')
+#define ERRCODE_WARNING_IMPLICIT_ZERO_BIT_PADDING MAKE_SQLSTATE('0','1','0','0','8')
+#define ERRCODE_WARNING_NULL_VALUE_ELIMINATED_IN_SET_FUNCTION MAKE_SQLSTATE('0','1','0','0','3')
+#define ERRCODE_WARNING_PRIVILEGE_NOT_GRANTED MAKE_SQLSTATE('0','1','0','0','7')
+#define ERRCODE_WARNING_PRIVILEGE_NOT_REVOKED MAKE_SQLSTATE('0','1','0','0','6')
+#define ERRCODE_WARNING_STRING_DATA_RIGHT_TRUNCATION MAKE_SQLSTATE('0','1','0','0','4')
+#define ERRCODE_WARNING_DEPRECATED_FEATURE MAKE_SQLSTATE('0','1','P','0','1')
+
+/* Class 02 - No Data (this is also a warning class per the SQL standard) */
+#define ERRCODE_NO_DATA MAKE_SQLSTATE('0','2','0','0','0')
+#define ERRCODE_NO_ADDITIONAL_DYNAMIC_RESULT_SETS_RETURNED MAKE_SQLSTATE('0','2','0','0','1')
+
+/* Class 03 - SQL Statement Not Yet Complete */
+#define ERRCODE_SQL_STATEMENT_NOT_YET_COMPLETE MAKE_SQLSTATE('0','3','0','0','0')
+
+/* Class 08 - Connection Exception */
+#define ERRCODE_CONNECTION_EXCEPTION MAKE_SQLSTATE('0','8','0','0','0')
+#define ERRCODE_CONNECTION_DOES_NOT_EXIST MAKE_SQLSTATE('0','8','0','0','3')
+#define ERRCODE_CONNECTION_FAILURE MAKE_SQLSTATE('0','8','0','0','6')
+#define ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION MAKE_SQLSTATE('0','8','0','0','1')
+#define ERRCODE_SQLSERVER_REJECTED_ESTABLISHMENT_OF_SQLCONNECTION MAKE_SQLSTATE('0','8','0','0','4')
+#define ERRCODE_TRANSACTION_RESOLUTION_UNKNOWN MAKE_SQLSTATE('0','8','0','0','7')
+#define ERRCODE_PROTOCOL_VIOLATION MAKE_SQLSTATE('0','8','P','0','1')
+
+/* Class 09 - Triggered Action Exception */
+#define ERRCODE_TRIGGERED_ACTION_EXCEPTION MAKE_SQLSTATE('0','9','0','0','0')
+
+/* Class 0A - Feature Not Supported */
+#define ERRCODE_FEATURE_NOT_SUPPORTED MAKE_SQLSTATE('0','A','0','0','0')
+
+/* Class 0B - Invalid Transaction Initiation */
+#define ERRCODE_INVALID_TRANSACTION_INITIATION MAKE_SQLSTATE('0','B','0','0','0')
+
+/* Class 0F - Locator Exception */
+#define ERRCODE_LOCATOR_EXCEPTION MAKE_SQLSTATE('0','F','0','0','0')
+#define ERRCODE_L_E_INVALID_SPECIFICATION MAKE_SQLSTATE('0','F','0','0','1')
+
+/* Class 0L - Invalid Grantor */
+#define ERRCODE_INVALID_GRANTOR MAKE_SQLSTATE('0','L','0','0','0')
+#define ERRCODE_INVALID_GRANT_OPERATION MAKE_SQLSTATE('0','L','P','0','1')
+
+/* Class 0P - Invalid Role Specification */
+#define ERRCODE_INVALID_ROLE_SPECIFICATION MAKE_SQLSTATE('0','P','0','0','0')
+
+/* Class 0Z - Diagnostics Exception */
+#define ERRCODE_DIAGNOSTICS_EXCEPTION MAKE_SQLSTATE('0','Z','0','0','0')
+#define ERRCODE_STACKED_DIAGNOSTICS_ACCESSED_WITHOUT_ACTIVE_HANDLER MAKE_SQLSTATE('0','Z','0','0','2')
+
+/* Class 20 - Case Not Found */
+#define ERRCODE_CASE_NOT_FOUND MAKE_SQLSTATE('2','0','0','0','0')
+
+/* Class 21 - Cardinality Violation */
+#define ERRCODE_CARDINALITY_VIOLATION MAKE_SQLSTATE('2','1','0','0','0')
+
+/* Class 22 - Data Exception */
+#define ERRCODE_DATA_EXCEPTION MAKE_SQLSTATE('2','2','0','0','0')
+#define ERRCODE_ARRAY_ELEMENT_ERROR MAKE_SQLSTATE('2','2','0','2','E')
+#define ERRCODE_ARRAY_SUBSCRIPT_ERROR MAKE_SQLSTATE('2','2','0','2','E')
+#define ERRCODE_CHARACTER_NOT_IN_REPERTOIRE MAKE_SQLSTATE('2','2','0','2','1')
+#define ERRCODE_DATETIME_FIELD_OVERFLOW MAKE_SQLSTATE('2','2','0','0','8')
+#define ERRCODE_DATETIME_VALUE_OUT_OF_RANGE MAKE_SQLSTATE('2','2','0','0','8')
+#define ERRCODE_DIVISION_BY_ZERO MAKE_SQLSTATE('2','2','0','1','2')
+#define ERRCODE_ERROR_IN_ASSIGNMENT MAKE_SQLSTATE('2','2','0','0','5')
+#define ERRCODE_ESCAPE_CHARACTER_CONFLICT MAKE_SQLSTATE('2','2','0','0','B')
+#define ERRCODE_INDICATOR_OVERFLOW MAKE_SQLSTATE('2','2','0','2','2')
+#define ERRCODE_INTERVAL_FIELD_OVERFLOW MAKE_SQLSTATE('2','2','0','1','5')
+#define ERRCODE_INVALID_ARGUMENT_FOR_LOG MAKE_SQLSTATE('2','2','0','1','E')
+#define ERRCODE_INVALID_ARGUMENT_FOR_NTILE MAKE_SQLSTATE('2','2','0','1','4')
+#define ERRCODE_INVALID_ARGUMENT_FOR_NTH_VALUE MAKE_SQLSTATE('2','2','0','1','6')
+#define ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION MAKE_SQLSTATE('2','2','0','1','F')
+#define ERRCODE_INVALID_ARGUMENT_FOR_WIDTH_BUCKET_FUNCTION MAKE_SQLSTATE('2','2','0','1','G')
+#define ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST MAKE_SQLSTATE('2','2','0','1','8')
+#define ERRCODE_INVALID_DATETIME_FORMAT MAKE_SQLSTATE('2','2','0','0','7')
+#define ERRCODE_INVALID_ESCAPE_CHARACTER MAKE_SQLSTATE('2','2','0','1','9')
+#define ERRCODE_INVALID_ESCAPE_OCTET MAKE_SQLSTATE('2','2','0','0','D')
+#define ERRCODE_INVALID_ESCAPE_SEQUENCE MAKE_SQLSTATE('2','2','0','2','5')
+#define ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER MAKE_SQLSTATE('2','2','P','0','6')
+#define ERRCODE_INVALID_INDICATOR_PARAMETER_VALUE MAKE_SQLSTATE('2','2','0','1','0')
+#define ERRCODE_INVALID_PARAMETER_VALUE MAKE_SQLSTATE('2','2','0','2','3')
+#define ERRCODE_INVALID_PRECEDING_OR_FOLLOWING_SIZE MAKE_SQLSTATE('2','2','0','1','3')
+#define ERRCODE_INVALID_REGULAR_EXPRESSION MAKE_SQLSTATE('2','2','0','1','B')
+#define ERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE MAKE_SQLSTATE('2','2','0','1','W')
+#define ERRCODE_INVALID_ROW_COUNT_IN_RESULT_OFFSET_CLAUSE MAKE_SQLSTATE('2','2','0','1','X')
+#define ERRCODE_INVALID_TABLESAMPLE_ARGUMENT MAKE_SQLSTATE('2','2','0','2','H')
+#define ERRCODE_INVALID_TABLESAMPLE_REPEAT MAKE_SQLSTATE('2','2','0','2','G')
+#define ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE MAKE_SQLSTATE('2','2','0','0','9')
+#define ERRCODE_INVALID_USE_OF_ESCAPE_CHARACTER MAKE_SQLSTATE('2','2','0','0','C')
+#define ERRCODE_MOST_SPECIFIC_TYPE_MISMATCH MAKE_SQLSTATE('2','2','0','0','G')
+#define ERRCODE_NULL_VALUE_NOT_ALLOWED MAKE_SQLSTATE('2','2','0','0','4')
+#define ERRCODE_NULL_VALUE_NO_INDICATOR_PARAMETER MAKE_SQLSTATE('2','2','0','0','2')
+#define ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE MAKE_SQLSTATE('2','2','0','0','3')
+#define ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED MAKE_SQLSTATE('2','2','0','0','H')
+#define ERRCODE_STRING_DATA_LENGTH_MISMATCH MAKE_SQLSTATE('2','2','0','2','6')
+#define ERRCODE_STRING_DATA_RIGHT_TRUNCATION MAKE_SQLSTATE('2','2','0','0','1')
+#define ERRCODE_SUBSTRING_ERROR MAKE_SQLSTATE('2','2','0','1','1')
+#define ERRCODE_TRIM_ERROR MAKE_SQLSTATE('2','2','0','2','7')
+#define ERRCODE_UNTERMINATED_C_STRING MAKE_SQLSTATE('2','2','0','2','4')
+#define ERRCODE_ZERO_LENGTH_CHARACTER_STRING MAKE_SQLSTATE('2','2','0','0','F')
+#define ERRCODE_FLOATING_POINT_EXCEPTION MAKE_SQLSTATE('2','2','P','0','1')
+#define ERRCODE_INVALID_TEXT_REPRESENTATION MAKE_SQLSTATE('2','2','P','0','2')
+#define ERRCODE_INVALID_BINARY_REPRESENTATION MAKE_SQLSTATE('2','2','P','0','3')
+#define ERRCODE_BAD_COPY_FILE_FORMAT MAKE_SQLSTATE('2','2','P','0','4')
+#define ERRCODE_UNTRANSLATABLE_CHARACTER MAKE_SQLSTATE('2','2','P','0','5')
+#define ERRCODE_NOT_AN_XML_DOCUMENT MAKE_SQLSTATE('2','2','0','0','L')
+#define ERRCODE_INVALID_XML_DOCUMENT MAKE_SQLSTATE('2','2','0','0','M')
+#define ERRCODE_INVALID_XML_CONTENT MAKE_SQLSTATE('2','2','0','0','N')
+#define ERRCODE_INVALID_XML_COMMENT MAKE_SQLSTATE('2','2','0','0','S')
+#define ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION MAKE_SQLSTATE('2','2','0','0','T')
+#define ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE MAKE_SQLSTATE('2','2','0','3','0')
+#define ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION MAKE_SQLSTATE('2','2','0','3','1')
+#define ERRCODE_INVALID_JSON_TEXT MAKE_SQLSTATE('2','2','0','3', '2')
+#define ERRCODE_INVALID_SQL_JSON_SUBSCRIPT MAKE_SQLSTATE('2','2','0','3','3')
+#define ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM MAKE_SQLSTATE('2','2','0','3','4')
+#define ERRCODE_NO_SQL_JSON_ITEM MAKE_SQLSTATE('2','2','0','3','5')
+#define ERRCODE_NON_NUMERIC_SQL_JSON_ITEM MAKE_SQLSTATE('2','2','0','3','6')
+#define ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT MAKE_SQLSTATE('2','2','0','3','7')
+#define ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED MAKE_SQLSTATE('2','2','0','3','8')
+#define ERRCODE_SQL_JSON_ARRAY_NOT_FOUND MAKE_SQLSTATE('2','2','0','3','9')
+#define ERRCODE_SQL_JSON_MEMBER_NOT_FOUND MAKE_SQLSTATE('2','2','0','3','A')
+#define ERRCODE_SQL_JSON_NUMBER_NOT_FOUND MAKE_SQLSTATE('2','2','0','3','B')
+#define ERRCODE_SQL_JSON_OBJECT_NOT_FOUND MAKE_SQLSTATE('2','2','0','3','C')
+#define ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS MAKE_SQLSTATE('2','2','0','3','D')
+#define ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS MAKE_SQLSTATE('2','2','0','3','E')
+#define ERRCODE_SQL_JSON_SCALAR_REQUIRED MAKE_SQLSTATE('2','2','0','3','F')
+#define ERRCODE_FOR_IGNORE_ERRORS MAKE_SQLSTATE('2','2','0','4','0')
+
+/* Class 23 - Integrity Constraint Violation */
+#define ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION MAKE_SQLSTATE('2','3','0','0','0')
+#define ERRCODE_RESTRICT_VIOLATION MAKE_SQLSTATE('2','3','0','0','1')
+#define ERRCODE_NOT_NULL_VIOLATION MAKE_SQLSTATE('2','3','5','0','2')
+#define ERRCODE_FOREIGN_KEY_VIOLATION MAKE_SQLSTATE('2','3','5','0','3')
+#define ERRCODE_UNIQUE_VIOLATION MAKE_SQLSTATE('2','3','5','0','5')
+#define ERRCODE_CHECK_VIOLATION MAKE_SQLSTATE('2','3','5','1','4')
+#define ERRCODE_EXCLUSION_VIOLATION MAKE_SQLSTATE('2','3','P','0','1')
+
+/* Class 24 - Invalid Cursor State */
+#define ERRCODE_INVALID_CURSOR_STATE MAKE_SQLSTATE('2','4','0','0','0')
+
+/* Class 25 - Invalid Transaction State */
+#define ERRCODE_INVALID_TRANSACTION_STATE MAKE_SQLSTATE('2','5','0','0','0')
+#define ERRCODE_ACTIVE_SQL_TRANSACTION MAKE_SQLSTATE('2','5','0','0','1')
+#define ERRCODE_BRANCH_TRANSACTION_ALREADY_ACTIVE MAKE_SQLSTATE('2','5','0','0','2')
+#define ERRCODE_HELD_CURSOR_REQUIRES_SAME_ISOLATION_LEVEL MAKE_SQLSTATE('2','5','0','0','8')
+#define ERRCODE_INAPPROPRIATE_ACCESS_MODE_FOR_BRANCH_TRANSACTION MAKE_SQLSTATE('2','5','0','0','3')
+#define ERRCODE_INAPPROPRIATE_ISOLATION_LEVEL_FOR_BRANCH_TRANSACTION MAKE_SQLSTATE('2','5','0','0','4')
+#define ERRCODE_NO_ACTIVE_SQL_TRANSACTION_FOR_BRANCH_TRANSACTION MAKE_SQLSTATE('2','5','0','0','5')
+#define ERRCODE_READ_ONLY_SQL_TRANSACTION MAKE_SQLSTATE('2','5','0','0','6')
+#define ERRCODE_SCHEMA_AND_DATA_STATEMENT_MIXING_NOT_SUPPORTED MAKE_SQLSTATE('2','5','0','0','7')
+#define ERRCODE_NO_ACTIVE_SQL_TRANSACTION MAKE_SQLSTATE('2','5','P','0','1')
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION MAKE_SQLSTATE('2','5','P','0','2')
+#define ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT MAKE_SQLSTATE('2','5','P','0','3')
+
+/* Class 26 - Invalid SQL Statement Name */
+#define ERRCODE_INVALID_SQL_STATEMENT_NAME MAKE_SQLSTATE('2','6','0','0','0')
+
+/* Class 27 - Triggered Data Change Violation */
+#define ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION MAKE_SQLSTATE('2','7','0','0','0')
+
+/* Class 28 - Invalid Authorization Specification */
+#define ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION MAKE_SQLSTATE('2','8','0','0','0')
+#define ERRCODE_INVALID_PASSWORD MAKE_SQLSTATE('2','8','P','0','1')
+
+/* Class 2B - Dependent Privilege Descriptors Still Exist */
+#define ERRCODE_DEPENDENT_PRIVILEGE_DESCRIPTORS_STILL_EXIST MAKE_SQLSTATE('2','B','0','0','0')
+#define ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST MAKE_SQLSTATE('2','B','P','0','1')
+
+/* Class 2D - Invalid Transaction Termination */
+#define ERRCODE_INVALID_TRANSACTION_TERMINATION MAKE_SQLSTATE('2','D','0','0','0')
+
+/* Class 2F - SQL Routine Exception */
+#define ERRCODE_SQL_ROUTINE_EXCEPTION MAKE_SQLSTATE('2','F','0','0','0')
+#define ERRCODE_S_R_E_FUNCTION_EXECUTED_NO_RETURN_STATEMENT MAKE_SQLSTATE('2','F','0','0','5')
+#define ERRCODE_S_R_E_MODIFYING_SQL_DATA_NOT_PERMITTED MAKE_SQLSTATE('2','F','0','0','2')
+#define ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED MAKE_SQLSTATE('2','F','0','0','3')
+#define ERRCODE_S_R_E_READING_SQL_DATA_NOT_PERMITTED MAKE_SQLSTATE('2','F','0','0','4')
+
+/* Class 34 - Invalid Cursor Name */
+#define ERRCODE_INVALID_CURSOR_NAME MAKE_SQLSTATE('3','4','0','0','0')
+
+/* Class 38 - External Routine Exception */
+#define ERRCODE_EXTERNAL_ROUTINE_EXCEPTION MAKE_SQLSTATE('3','8','0','0','0')
+#define ERRCODE_E_R_E_CONTAINING_SQL_NOT_PERMITTED MAKE_SQLSTATE('3','8','0','0','1')
+#define ERRCODE_E_R_E_MODIFYING_SQL_DATA_NOT_PERMITTED MAKE_SQLSTATE('3','8','0','0','2')
+#define ERRCODE_E_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED MAKE_SQLSTATE('3','8','0','0','3')
+#define ERRCODE_E_R_E_READING_SQL_DATA_NOT_PERMITTED MAKE_SQLSTATE('3','8','0','0','4')
+
+/* Class 39 - External Routine Invocation Exception */
+#define ERRCODE_EXTERNAL_ROUTINE_INVOCATION_EXCEPTION MAKE_SQLSTATE('3','9','0','0','0')
+#define ERRCODE_E_R_I_E_INVALID_SQLSTATE_RETURNED MAKE_SQLSTATE('3','9','0','0','1')
+#define ERRCODE_E_R_I_E_NULL_VALUE_NOT_ALLOWED MAKE_SQLSTATE('3','9','0','0','4')
+#define ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED MAKE_SQLSTATE('3','9','P','0','1')
+#define ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED MAKE_SQLSTATE('3','9','P','0','2')
+#define ERRCODE_E_R_I_E_EVENT_TRIGGER_PROTOCOL_VIOLATED MAKE_SQLSTATE('3','9','P','0','3')
+
+/* Class 3B - Savepoint Exception */
+#define ERRCODE_SAVEPOINT_EXCEPTION MAKE_SQLSTATE('3','B','0','0','0')
+#define ERRCODE_S_E_INVALID_SPECIFICATION MAKE_SQLSTATE('3','B','0','0','1')
+
+/* Class 3D - Invalid Catalog Name */
+#define ERRCODE_INVALID_CATALOG_NAME MAKE_SQLSTATE('3','D','0','0','0')
+
+/* Class 3F - Invalid Schema Name */
+#define ERRCODE_INVALID_SCHEMA_NAME MAKE_SQLSTATE('3','F','0','0','0')
+
+/* Class 40 - Transaction Rollback */
+#define ERRCODE_TRANSACTION_ROLLBACK MAKE_SQLSTATE('4','0','0','0','0')
+#define ERRCODE_T_R_INTEGRITY_CONSTRAINT_VIOLATION MAKE_SQLSTATE('4','0','0','0','2')
+#define ERRCODE_T_R_SERIALIZATION_FAILURE MAKE_SQLSTATE('4','0','0','0','1')
+#define ERRCODE_T_R_STATEMENT_COMPLETION_UNKNOWN MAKE_SQLSTATE('4','0','0','0','3')
+#define ERRCODE_T_R_DEADLOCK_DETECTED MAKE_SQLSTATE('4','0','P','0','1')
+
+/* Class 42 - Syntax Error or Access Rule Violation */
+#define ERRCODE_SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION MAKE_SQLSTATE('4','2','0','0','0')
+#define ERRCODE_SYNTAX_ERROR MAKE_SQLSTATE('4','2','6','0','1')
+#define ERRCODE_INSUFFICIENT_PRIVILEGE MAKE_SQLSTATE('4','2','5','0','1')
+#define ERRCODE_CANNOT_COERCE MAKE_SQLSTATE('4','2','8','4','6')
+#define ERRCODE_GROUPING_ERROR MAKE_SQLSTATE('4','2','8','0','3')
+#define ERRCODE_WINDOWING_ERROR MAKE_SQLSTATE('4','2','P','2','0')
+#define ERRCODE_INVALID_RECURSION MAKE_SQLSTATE('4','2','P','1','9')
+#define ERRCODE_INVALID_FOREIGN_KEY MAKE_SQLSTATE('4','2','8','3','0')
+#define ERRCODE_INVALID_NAME MAKE_SQLSTATE('4','2','6','0','2')
+#define ERRCODE_NAME_TOO_LONG MAKE_SQLSTATE('4','2','6','2','2')
+#define ERRCODE_RESERVED_NAME MAKE_SQLSTATE('4','2','9','3','9')
+#define ERRCODE_DATATYPE_MISMATCH MAKE_SQLSTATE('4','2','8','0','4')
+#define ERRCODE_INDETERMINATE_DATATYPE MAKE_SQLSTATE('4','2','P','1','8')
+#define ERRCODE_COLLATION_MISMATCH MAKE_SQLSTATE('4','2','P','2','1')
+#define ERRCODE_INDETERMINATE_COLLATION MAKE_SQLSTATE('4','2','P','2','2')
+#define ERRCODE_WRONG_OBJECT_TYPE MAKE_SQLSTATE('4','2','8','0','9')
+#define ERRCODE_GENERATED_ALWAYS MAKE_SQLSTATE('4','2','8','C','9')
+#define ERRCODE_UNDEFINED_COLUMN MAKE_SQLSTATE('4','2','7','0','3')
+#define ERRCODE_UNDEFINED_CURSOR MAKE_SQLSTATE('3','4','0','0','0')
+#define ERRCODE_UNDEFINED_DATABASE MAKE_SQLSTATE('3','D','0','0','0')
+#define ERRCODE_UNDEFINED_FUNCTION MAKE_SQLSTATE('4','2','8','8','3')
+#define ERRCODE_UNDEFINED_PSTATEMENT MAKE_SQLSTATE('2','6','0','0','0')
+#define ERRCODE_UNDEFINED_SCHEMA MAKE_SQLSTATE('3','F','0','0','0')
+#define ERRCODE_UNDEFINED_TABLE MAKE_SQLSTATE('4','2','P','0','1')
+#define ERRCODE_UNDEFINED_PARAMETER MAKE_SQLSTATE('4','2','P','0','2')
+#define ERRCODE_UNDEFINED_OBJECT MAKE_SQLSTATE('4','2','7','0','4')
+#define ERRCODE_DUPLICATE_COLUMN MAKE_SQLSTATE('4','2','7','0','1')
+#define ERRCODE_DUPLICATE_CURSOR MAKE_SQLSTATE('4','2','P','0','3')
+#define ERRCODE_DUPLICATE_DATABASE MAKE_SQLSTATE('4','2','P','0','4')
+#define ERRCODE_DUPLICATE_FUNCTION MAKE_SQLSTATE('4','2','7','2','3')
+#define ERRCODE_DUPLICATE_PSTATEMENT MAKE_SQLSTATE('4','2','P','0','5')
+#define ERRCODE_DUPLICATE_SCHEMA MAKE_SQLSTATE('4','2','P','0','6')
+#define ERRCODE_DUPLICATE_TABLE MAKE_SQLSTATE('4','2','P','0','7')
+#define ERRCODE_DUPLICATE_ALIAS MAKE_SQLSTATE('4','2','7','1','2')
+#define ERRCODE_DUPLICATE_OBJECT MAKE_SQLSTATE('4','2','7','1','0')
+#define ERRCODE_AMBIGUOUS_COLUMN MAKE_SQLSTATE('4','2','7','0','2')
+#define ERRCODE_AMBIGUOUS_FUNCTION MAKE_SQLSTATE('4','2','7','2','5')
+#define ERRCODE_AMBIGUOUS_PARAMETER MAKE_SQLSTATE('4','2','P','0','8')
+#define ERRCODE_AMBIGUOUS_ALIAS MAKE_SQLSTATE('4','2','P','0','9')
+#define ERRCODE_INVALID_COLUMN_REFERENCE MAKE_SQLSTATE('4','2','P','1','0')
+#define ERRCODE_INVALID_COLUMN_DEFINITION MAKE_SQLSTATE('4','2','6','1','1')
+#define ERRCODE_INVALID_CURSOR_DEFINITION MAKE_SQLSTATE('4','2','P','1','1')
+#define ERRCODE_INVALID_DATABASE_DEFINITION MAKE_SQLSTATE('4','2','P','1','2')
+#define ERRCODE_INVALID_FUNCTION_DEFINITION MAKE_SQLSTATE('4','2','P','1','3')
+#define ERRCODE_INVALID_PSTATEMENT_DEFINITION MAKE_SQLSTATE('4','2','P','1','4')
+#define ERRCODE_INVALID_SCHEMA_DEFINITION MAKE_SQLSTATE('4','2','P','1','5')
+#define ERRCODE_INVALID_TABLE_DEFINITION MAKE_SQLSTATE('4','2','P','1','6')
+#define ERRCODE_INVALID_OBJECT_DEFINITION MAKE_SQLSTATE('4','2','P','1','7')
+
+/* Class 44 - WITH CHECK OPTION Violation */
+#define ERRCODE_WITH_CHECK_OPTION_VIOLATION MAKE_SQLSTATE('4','4','0','0','0')
+
+/* Class 53 - Insufficient Resources */
+#define ERRCODE_INSUFFICIENT_RESOURCES MAKE_SQLSTATE('5','3','0','0','0')
+#define ERRCODE_DISK_FULL MAKE_SQLSTATE('5','3','1','0','0')
+#define ERRCODE_OUT_OF_MEMORY MAKE_SQLSTATE('5','3','2','0','0')
+#define ERRCODE_TOO_MANY_CONNECTIONS MAKE_SQLSTATE('5','3','3','0','0')
+#define ERRCODE_CONFIGURATION_LIMIT_EXCEEDED MAKE_SQLSTATE('5','3','4','0','0')
+
+/* Class 54 - Program Limit Exceeded */
+#define ERRCODE_PROGRAM_LIMIT_EXCEEDED MAKE_SQLSTATE('5','4','0','0','0')
+#define ERRCODE_STATEMENT_TOO_COMPLEX MAKE_SQLSTATE('5','4','0','0','1')
+#define ERRCODE_TOO_MANY_COLUMNS MAKE_SQLSTATE('5','4','0','1','1')
+#define ERRCODE_TOO_MANY_ARGUMENTS MAKE_SQLSTATE('5','4','0','2','3')
+
+/* Class 55 - Object Not In Prerequisite State */
+#define ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE MAKE_SQLSTATE('5','5','0','0','0')
+#define ERRCODE_OBJECT_IN_USE MAKE_SQLSTATE('5','5','0','0','6')
+#define ERRCODE_CANT_CHANGE_RUNTIME_PARAM MAKE_SQLSTATE('5','5','P','0','2')
+#define ERRCODE_LOCK_NOT_AVAILABLE MAKE_SQLSTATE('5','5','P','0','3')
+#define ERRCODE_UNSAFE_NEW_ENUM_VALUE_USAGE MAKE_SQLSTATE('5','5','P','0','4')
+
+/* Class 57 - Operator Intervention */
+#define ERRCODE_OPERATOR_INTERVENTION MAKE_SQLSTATE('5','7','0','0','0')
+#define ERRCODE_QUERY_CANCELED MAKE_SQLSTATE('5','7','0','1','4')
+#define ERRCODE_ADMIN_SHUTDOWN MAKE_SQLSTATE('5','7','P','0','1')
+#define ERRCODE_CRASH_SHUTDOWN MAKE_SQLSTATE('5','7','P','0','2')
+#define ERRCODE_CANNOT_CONNECT_NOW MAKE_SQLSTATE('5','7','P','0','3')
+#define ERRCODE_DATABASE_DROPPED MAKE_SQLSTATE('5','7','P','0','4')
+#define ERRCODE_IDLE_SESSION_TIMEOUT MAKE_SQLSTATE('5','7','P','0','5')
+
+/* Class 58 - System Error (errors external to PostgreSQL itself) */
+#define ERRCODE_SYSTEM_ERROR MAKE_SQLSTATE('5','8','0','0','0')
+#define ERRCODE_IO_ERROR MAKE_SQLSTATE('5','8','0','3','0')
+#define ERRCODE_UNDEFINED_FILE MAKE_SQLSTATE('5','8','P','0','1')
+#define ERRCODE_DUPLICATE_FILE MAKE_SQLSTATE('5','8','P','0','2')
+
+/* Class 72 - Snapshot Failure */
+#define ERRCODE_SNAPSHOT_TOO_OLD MAKE_SQLSTATE('7','2','0','0','0')
+
+/* Class F0 - Configuration File Error */
+#define ERRCODE_CONFIG_FILE_ERROR MAKE_SQLSTATE('F','0','0','0','0')
+#define ERRCODE_LOCK_FILE_EXISTS MAKE_SQLSTATE('F','0','0','0','1')
+
+/* Class HV - Foreign Data Wrapper Error (SQL/MED) */
+#define ERRCODE_FDW_ERROR MAKE_SQLSTATE('H','V','0','0','0')
+#define ERRCODE_FDW_COLUMN_NAME_NOT_FOUND MAKE_SQLSTATE('H','V','0','0','5')
+#define ERRCODE_FDW_DYNAMIC_PARAMETER_VALUE_NEEDED MAKE_SQLSTATE('H','V','0','0','2')
+#define ERRCODE_FDW_FUNCTION_SEQUENCE_ERROR MAKE_SQLSTATE('H','V','0','1','0')
+#define ERRCODE_FDW_INCONSISTENT_DESCRIPTOR_INFORMATION MAKE_SQLSTATE('H','V','0','2','1')
+#define ERRCODE_FDW_INVALID_ATTRIBUTE_VALUE MAKE_SQLSTATE('H','V','0','2','4')
+#define ERRCODE_FDW_INVALID_COLUMN_NAME MAKE_SQLSTATE('H','V','0','0','7')
+#define ERRCODE_FDW_INVALID_COLUMN_NUMBER MAKE_SQLSTATE('H','V','0','0','8')
+#define ERRCODE_FDW_INVALID_DATA_TYPE MAKE_SQLSTATE('H','V','0','0','4')
+#define ERRCODE_FDW_INVALID_DATA_TYPE_DESCRIPTORS MAKE_SQLSTATE('H','V','0','0','6')
+#define ERRCODE_FDW_INVALID_DESCRIPTOR_FIELD_IDENTIFIER MAKE_SQLSTATE('H','V','0','9','1')
+#define ERRCODE_FDW_INVALID_HANDLE MAKE_SQLSTATE('H','V','0','0','B')
+#define ERRCODE_FDW_INVALID_OPTION_INDEX MAKE_SQLSTATE('H','V','0','0','C')
+#define ERRCODE_FDW_INVALID_OPTION_NAME MAKE_SQLSTATE('H','V','0','0','D')
+#define ERRCODE_FDW_INVALID_STRING_LENGTH_OR_BUFFER_LENGTH MAKE_SQLSTATE('H','V','0','9','0')
+#define ERRCODE_FDW_INVALID_STRING_FORMAT MAKE_SQLSTATE('H','V','0','0','A')
+#define ERRCODE_FDW_INVALID_USE_OF_NULL_POINTER MAKE_SQLSTATE('H','V','0','0','9')
+#define ERRCODE_FDW_TOO_MANY_HANDLES MAKE_SQLSTATE('H','V','0','1','4')
+#define ERRCODE_FDW_OUT_OF_MEMORY MAKE_SQLSTATE('H','V','0','0','1')
+#define ERRCODE_FDW_NO_SCHEMAS MAKE_SQLSTATE('H','V','0','0','P')
+#define ERRCODE_FDW_OPTION_NAME_NOT_FOUND MAKE_SQLSTATE('H','V','0','0','J')
+#define ERRCODE_FDW_REPLY_HANDLE MAKE_SQLSTATE('H','V','0','0','K')
+#define ERRCODE_FDW_SCHEMA_NOT_FOUND MAKE_SQLSTATE('H','V','0','0','Q')
+#define ERRCODE_FDW_TABLE_NOT_FOUND MAKE_SQLSTATE('H','V','0','0','R')
+#define ERRCODE_FDW_UNABLE_TO_CREATE_EXECUTION MAKE_SQLSTATE('H','V','0','0','L')
+#define ERRCODE_FDW_UNABLE_TO_CREATE_REPLY MAKE_SQLSTATE('H','V','0','0','M')
+#define ERRCODE_FDW_UNABLE_TO_ESTABLISH_CONNECTION MAKE_SQLSTATE('H','V','0','0','N')
+
+/* Class P0 - PL/pgSQL Error */
+#define ERRCODE_PLPGSQL_ERROR MAKE_SQLSTATE('P','0','0','0','0')
+#define ERRCODE_RAISE_EXCEPTION MAKE_SQLSTATE('P','0','0','0','1')
+#define ERRCODE_NO_DATA_FOUND MAKE_SQLSTATE('P','0','0','0','2')
+#define ERRCODE_TOO_MANY_ROWS MAKE_SQLSTATE('P','0','0','0','3')
+#define ERRCODE_ASSERT_FAILURE MAKE_SQLSTATE('P','0','0','0','4')
+
+/* Class XX - Internal Error */
+#define ERRCODE_INTERNAL_ERROR MAKE_SQLSTATE('X','X','0','0','0')
+#define ERRCODE_DATA_CORRUPTED MAKE_SQLSTATE('X','X','0','0','1')
+#define ERRCODE_INDEX_CORRUPTED MAKE_SQLSTATE('X','X','0','0','2')
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 2f412ca3db..855cbeb401 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2545,7 +2545,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 264895d278..ade74ce5a1 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -31,6 +31,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
bool header_line; /* CSV header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 4d68d9cceb..93a3b74999 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -68,6 +68,7 @@ typedef struct CopyFromStateData
/* parameters from the COPY command */
Relation rel; /* relation to copy from */
List *attnumlist; /* integer list of attnums to copy */
+ int attr_count; /* length of attnumlist */
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f836acf876..2bb3cccea9 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..4e04efcbba 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,37 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- test IGNORE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 3 | 3 | 3
+(2 rows)
+
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2"
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 3 | 3 | 3
+(2 rows)
+
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2, column m: "a"
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 3 | 3 | 3
+(2 rows)
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -663,3 +694,4 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..3642c11f91 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,31 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- test IGNORE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3 3
+\.
+SELECT * FROM check_ign_err;
+
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2
+3 3 3
+\.
+SELECT * FROM check_ign_err;
+
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 a 2
+3 3 3
+\.
+SELECT * FROM check_ign_err;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -468,3 +493,4 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
--
2.25.1
Hi
so 18. 12. 2021 v 9:55 odesílatel Damir Belyalov <dam.bel07@gmail.com>
napsal:
Hello.
Wrote a patch implementing COPY with ignoring errors in rows using block
subtransactions.
It is great so you are working on this patch. Unfortunately, I am afraid
this simple design is not optimal. Using subtransaction for every row has
too big overhead. I think it should use subtransaction for blocks of rows
(1000 rows), and only when there is an exception, then it should replay
inserts in subtransaction per row. You should check performance overhead.
Regards
Pavel
Show quoted text
Syntax: COPY [table] FROM [file/stdin] WITH IGNORE_ERROS;
Examples:
CREATE TABLE check_ign_err (n int, m int, k int);
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 2 2 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)##################################################
TRUNCATE check_ign_err;
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2: "2 2"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)##################################################
TRUNCATE check_ign_err;
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
1 1 1
2 a 2
3 3 3
\.
WARNING: COPY check_ign_err, line 2, column m: "a"
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
3 | 3 | 3
(2 rows)Regards, Damir
пт, 10 дек. 2021 г. в 21:48, Pavel Stehule <pavel.stehule@gmail.com>:
2014-12-26 11:41 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2014-12-25 22:23 GMT+01:00 Alex Shulgin <ash@commandprompt.com>:
Trent Shipley <trent_shipley@qwest.net> writes:
On Friday 2007-12-14 16:22, Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
By modifying COPY: COPY IGNORE ERRORS or some such would instruct
COPY
to drop (and log) rows that contain malformed data. That is, rows
with
too many or too few columns, rows that result in constraint
violations,
and rows containing columns where the data type's input function
raises
an error. The last case is the only thing that would be a bit
tricky to
implement, I think: you could use PG_TRY() around the
InputFunctionCall,
but I guess you'd need a subtransaction to ensure that you reset
your
state correctly after catching an error.
Yeah. It's the subtransaction per row that's daunting --- not only
the
cycles spent for that, but the ensuing limitation to 4G rows imported
per COPY.You could extend the COPY FROM syntax with a COMMIT EVERY n clause.
This
would help with the 4G subtransaction limit. The cost to the ETL
process is
that a simple rollback would not be guaranteed send the process back
to it's
initial state. There are easy ways to deal with the rollback issue
though.
A {NO} RETRY {USING algorithm} clause might be useful. If the NO
RETRY
option is selected then the COPY FROM can run without subtransactions
and in
excess of the 4G per transaction limit. NO RETRY should be the
default since
it preserves the legacy behavior of COPY FROM.
You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
give the
option of sending exceptions to a table since they are presumably
malformed,
otherwise they would not be exceptions. (Users should re-process
exception
files if they want an if good then table a else exception to table b
...)
EXCEPTIONS TO and NO RETRY would be mutually exclusive.
If we could somehow only do a subtransaction per failure, things
would
be much better, but I don't see how.
Hello,
Attached is a proof of concept patch for this TODO item. There is no
docs yet, I just wanted to know if approach is sane.The added syntax is like the following:
COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work. The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.Example in action:
postgres=# \d test_copy2
Table "public.test_copy2"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
val | integer |postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 1: "1"
2
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 2: "2"
3
NOTICE: missing data for column "val"
CONTEXT: COPY test_copy2, line 3: "3"
NOTICE: total exceptions ignored: 3postgres=# \d test_copy1
Table "public.test_copy1"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not nullpostgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=#Limited performance testing shows no significant difference between
error-catching and plain code path. For example, timingcopy test_copy1 from program 'seq 1000000' [exceptions to stdout]
shows similar numbers with or without the added "exceptions to" clause.
Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds. Any
example of what would be not properly rolled back by just PG_TRY?this method is unsafe .. exception handlers doesn't free memory usually
- there is risk of memory leaks, source leaksyou can enforce same performance with block subtransactions - when you
use subtransaction for 1000 rows, then impact of subtransactions is minimalwhen block fails, then you can use row level subtransaction - it works
well when you expect almost correct data.Two years ago I wrote a extension that did it - but I have not time to
finish it and push to upstream.Regards
Pavel
Regards
Pavel
Happy hacking!
--
Alex--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi!
Improved my patch by adding block subtransactions.
The block size is determined by the REPLAY_BUFFER_SIZE parameter.
I used the idea of a buffer for accumulating tuples in it.
If we read REPLAY_BUFFER_SIZE rows without errors, the subtransaction will
be committed.
If we find an error, the subtransaction will rollback and the buffer will
be replayed containing tuples.
In the patch REPLAY_BUFFER_SIZE equals 3, but it can be changed to any
other number (for example 1000).
There is an idea to create a GUC parameter for it.
Also maybe create a GUC parameter for the number of occurring WARNINGS by
rows with errors.
For CIM_MULTI and CIM_MULTI_CONDITIONAL cases the buffer is not needed.
It is needed for the CIM_SINGLE case.
Tests:
-- CIM_MULTI case
CREATE TABLE check_ign_err (n int, m int, k int);
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
1 1 1
2 2 2 2
3 3
a 4 4
5 b b
7 7 7
\.
SELECT * FROM check_ign_err;
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
n | m | k
---+---+---
1 | 1 | 1
7 | 7 | 7
(2 rows)
##################################################
-- CIM_SINGLE case
-- BEFORE row trigger
CREATE TABLE trig_test(n int, m int);
CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
BEGIN
INSERT INTO trig_test VALUES(NEW.n, NEW.m);
RETURN NEW;
END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
1 1 1
2 2 2 2
3 3
a 4 4
5 b b
7 7 7
\.
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
7 | 7 | 7
(2 rows)
##################################################
-- INSTEAD OF row trigger
CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
BEGIN
INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
RETURN NEW;
END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b
7 7 7
\.
SELECT * FROM check_ign_err_view;
n | m | k
---+---+---
1 | 1 | 1
7 | 7 | 7
(2 rows)
##################################################
-- foreign table case in postgres_fdw extension
##################################################
-- volatile function in WHERE clause
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b
7 7 7
\.
SELECT * FROM check_ign_err;
n | m | k
---+---+---
1 | 1 | 1
(1 row)
##################################################
-- CIM_MULTI_CONDITIONAL case
-- INSERT triggers for partition tables
CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
FOR VALUES FROM (1) TO (4);
CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
FOR VALUES FROM (4) TO (8);
CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
BEGIN
INSERT INTO trig_test VALUES(NEW.n, NEW.m);
RETURN NEW;
END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING: COPY check_ign_err, line 2: "2 2 2 2"
WARNING: COPY check_ign_err, line 3: "3 3"
WARNING: COPY check_ign_err, line 4, column n: "a"
WARNING: COPY check_ign_err, line 5, column m: "b"
WARNING: COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b
7 7 7
\.
n | m | k
---+---+---
1 | 1 | 1
7 | 7 | 7
(2 rows)
Thanks for feedback.
Regards, Damir
Attachments:
0002-COPY-IGNORE_ERRORS.patchapplication/x-patch; name=0002-COPY-IGNORE_ERRORS.patchDownload
From 6bf2168cd962b3cce666a2cabf082f558eec848c Mon Sep 17 00:00:00 2001
From: Damir Belyalov <dam.bel07@gmail.com>
Date: Fri, 15 Oct 2021 11:55:18 +0300
Subject: [PATCH] COPY IGNORE_ERRORS
---
doc/src/sgml/ref/copy.sgml | 13 +++
src/backend/commands/copy.c | 8 ++
src/backend/commands/copyfrom.c | 138 +++++++++++++++++++++++++++-
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 118 ++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 110 ++++++++++++++++++++++
9 files changed, 395 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8aae711b3b..7d20b1649e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drop rows that contain malformed data while copying. That is rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows that result in constraint violations, rows containing columns where
+ the data type's input function raises an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3ac731803b..fead1aba46 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -402,6 +402,7 @@ ProcessCopyOptions(ParseState *pstate,
{
bool format_specified = false;
bool freeze_specified = false;
+ bool ignore_errors_specified = false;
bool header_specified = false;
ListCell *option;
@@ -442,6 +443,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..b994697b9d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -535,6 +535,7 @@ CopyFrom(CopyFromState cstate)
ExprContext *econtext;
TupleTableSlot *singleslot = NULL;
MemoryContext oldcontext = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
@@ -549,6 +550,17 @@ CopyFrom(CopyFromState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ /* variables for copy from ignore_errors option */
+#define REPLAY_BUFFER_SIZE 3
+ HeapTuple replay_buffer[REPLAY_BUFFER_SIZE];
+ HeapTuple replay_tuple;
+ int saved_tuples = 0;
+ int replayed_tuples = 0;
+ bool replay_is_active = false;
+ bool begin_subtransaction = true;
+ bool find_error = false;
+ bool last_replaying = false;
+
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
@@ -855,9 +867,129 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ /*
+ * If option IGNORE_ERRORS is enabled, COPY skip rows with errors.
+ * NextCopyFrom() directly store the values/nulls array in the slot.
+ */
+ if (cstate->opts.ignore_errors)
+ {
+ bool valid_row = true;
+ bool skip_row = false;
+
+ PG_TRY();
+ {
+ if (!replay_is_active)
+ {
+ if (begin_subtransaction)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = oldowner;
+ }
+
+ if (saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+ if (valid_row)
+ {
+ if (insertMethod == CIM_SINGLE)
+ {
+ MemoryContextSwitchTo(oldcontext);
+
+ replay_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ replay_buffer[saved_tuples++] = replay_tuple;
+
+ if (find_error)
+ skip_row = true;
+ }
+
+ begin_subtransaction = false;
+ }
+ }
+ else
+ {
+ ReleaseCurrentSubTransaction();
+
+ replay_is_active = true;
+ begin_subtransaction = true;
+ skip_row = true;
+ }
+ }
+ else
+ {
+ if (insertMethod == CIM_SINGLE && find_error && replayed_tuples < saved_tuples)
+ {
+ heap_deform_tuple(replay_buffer[replayed_tuples], RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ replayed_tuples++;
+ }
+ else
+ {
+ MemSet(replay_buffer, 0, REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ saved_tuples = 0;
+ replayed_tuples = 0;
+
+ replay_is_active = false;
+ find_error = false;
+ skip_row = true;
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ ErrorData *errdata;
+ MemoryContextSwitchTo(oldcontext);
+ errdata = CopyErrorData();
+
+ switch (errdata->sqlerrcode)
+ {
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ RollbackAndReleaseCurrentSubTransaction();
+ elog(WARNING, "%s", errdata->context);
+
+ begin_subtransaction = true;
+ find_error = true;
+ skip_row = true;
+
+ break;
+
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+ }
+ PG_END_TRY();
+
+ if (!valid_row)
+ {
+ if (!last_replaying)
+ {
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = oldowner;
+
+ if (replayed_tuples < saved_tuples)
+ {
+ replay_is_active = true;
+ skip_row = true;
+ last_replaying = true;
+ }
+ else
+ break;
+ }
+ else
+ break;
+ }
+
+ if (skip_row)
+ continue;
+ }
+ else
+ {
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index df5ceea910..3bb7235b34 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -800,7 +800,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3456,6 +3456,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeInteger(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17814,6 +17818,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18393,6 +18398,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index e572f585ef..feaf18b043 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2742,7 +2742,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..2b696f99bc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..74827ecca0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,124 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+-- CIM_SINGLE case
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err_view, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 3"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column m: "b"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+SELECT * FROM check_ign_err_view;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..7eee78bccd 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,116 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE case
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err_view;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.25.1
On 2022-07-19 21:40, Damir Belyalov wrote:
Hi!
Improved my patch by adding block subtransactions.
The block size is determined by the REPLAY_BUFFER_SIZE parameter.
I used the idea of a buffer for accumulating tuples in it.
If we read REPLAY_BUFFER_SIZE rows without errors, the subtransaction
will be committed.
If we find an error, the subtransaction will rollback and the buffer
will be replayed containing tuples.
Thanks for working on this!
I tested 0002-COPY-IGNORE_ERRORS.patch and faced an unexpected behavior.
I loaded 10000 rows which contained 1 wrong row.
I expected I could see 9999 rows after COPY, but just saw 999 rows.
Since when I changed MAX_BUFFERED_TUPLES from 1000 to other values, the
number of loaded rows also changed, I imagine MAX_BUFFERED_TUPLES might
be giving influence of this behavior.
```sh
$ cat /tmp/test10000.dat
1 aaa
2 aaa
3 aaa
4 aaa
5 aaa
6 aaa
7 aaa
8 aaa
9 aaa
10 aaa
11 aaa
...
9994 aaa
9995 aaa
9996 aaa
9997 aaa
9998 aaa
9999 aaa
xxx aaa
```
```SQL
=# CREATE TABLE test (id int, data text);
=# COPY test FROM '/tmp/test10000.dat' WITH (IGNORE_ERRORS);
WARNING: COPY test, line 10000, column i: "xxx"
COPY 9999
=# SELECT COUNT(*) FROM test;
count
-------
999
(1 row)
```
BTW I may be overlooking it, but have you submit this proposal to the
next CommitFest?
https://commitfest.postgresql.org/39/
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Thank you for feedback.
I improved my patch recently and tested it on different sizes of
MAX_BUFFERED_TUPLES and REPLAY_BUFFER_SIZE.
I loaded 10000 rows which contained 1 wrong row.
I expected I could see 9999 rows after COPY, but just saw 999 rows.
Also I implemented your case and it worked correctly.
BTW I may be overlooking it, but have you submit this proposal to the
next CommitFest?
Good idea. Haven't done it yet.
Regards,
Damir
Postgres Professional
Attachments:
0003-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0003-COPY_IGNORE_ERRORS.patchDownload
From fa6b99c129eb890b25f006bb7891a247c8a431a7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov <dam.bel07@gmail.com>
Date: Fri, 15 Oct 2021 11:55:18 +0300
Subject: [PATCH] COPY_IGNORE_ERRORS without GUC with function
safeNextCopyFrom() with struct SafeCopyFromState with refactoring
---
doc/src/sgml/ref/copy.sgml | 13 ++
src/backend/commands/copy.c | 8 ++
src/backend/commands/copyfrom.c | 162 ++++++++++++++++++++++-
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 21 +++
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 123 +++++++++++++++++
src/test/regress/sql/copy2.sql | 110 +++++++++++++++
10 files changed, 445 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8aae711b3b..7d20b1649e 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drop rows that contain malformed data while copying. That is rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows that result in constraint violations, rows containing columns where
+ the data type's input function raises an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3ac731803b..fead1aba46 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -402,6 +402,7 @@ ProcessCopyOptions(ParseState *pstate,
{
bool format_specified = false;
bool freeze_specified = false;
+ bool ignore_errors_specified = false;
bool header_specified = false;
ListCell *option;
@@ -442,6 +443,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..285c491ddd 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,9 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+
/*
* error context callback for COPY FROM
*
@@ -521,6 +524,125 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Analog of NextCopyFrom() but ignore rows with errors while copying.
+ */
+static bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ SafeCopyFromState *safecstate = cstate->safecstate;
+ bool valid_row = true;
+
+ safecstate->skip_row = false;
+
+ PG_TRY();
+ {
+ if (!safecstate->replay_is_active)
+ {
+ if (safecstate->begin_subtransaction)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = safecstate->oldowner;
+
+ safecstate->begin_subtransaction = false;
+ }
+
+ if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ valid_row = NextCopyFrom(cstate, econtext, values, nulls);
+ if (valid_row)
+ {
+ /* Fill replay_buffer in oldcontext*/
+ MemoryContextSwitchTo(safecstate->oldcontext);
+ safecstate->replay_buffer[safecstate->saved_tuples++] = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);
+
+ safecstate->skip_row = true;
+ }
+ else if (!safecstate->processed_remaining_tuples)
+ {
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+ if (safecstate->replayed_tuples < safecstate->saved_tuples)
+ {
+ /* Prepare to replay remaining tuples if they exist */
+ safecstate->replay_is_active = true;
+ safecstate->processed_remaining_tuples = true;
+ safecstate->skip_row = true;
+ return true;
+ }
+ }
+ }
+ else
+ {
+ /* Buffer was filled, commit subtransaction and prepare to replay */
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ safecstate->replay_is_active = true;
+ safecstate->begin_subtransaction = true;
+ safecstate->skip_row = true;
+ }
+ }
+ else
+ {
+ if (safecstate->replayed_tuples < safecstate->saved_tuples)
+ {
+ /* Replaying tuple */
+ heap_deform_tuple(safecstate->replay_buffer[safecstate->replayed_tuples++], RelationGetDescr(cstate->rel), values, nulls);
+ }
+ else
+ {
+ /* Clean up replay_buffer */
+ MemSet(safecstate->replay_buffer, 0, REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ safecstate->saved_tuples = safecstate->replayed_tuples = 0;
+
+ safecstate->replay_is_active = false;
+ safecstate->skip_row = true;
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ ErrorData *errdata;
+ MemoryContextSwitchTo(safecstate->oldcontext);
+ errdata = CopyErrorData();
+
+ switch (errdata->sqlerrcode)
+ {
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ safecstate->errors++;
+ if (safecstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+
+ safecstate->begin_subtransaction = true;
+ safecstate->skip_row = true;
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+ }
+ PG_END_TRY();
+
+ if (!valid_row)
+ {
+ ereport(WARNING,
+ errmsg("FIND %d ERRORS", safecstate->errors));
+ return false;
+ }
+
+ return true;
+}
+
/*
* Copy FROM file to relation.
*/
@@ -535,6 +657,7 @@ CopyFrom(CopyFromState cstate)
ExprContext *econtext;
TupleTableSlot *singleslot = NULL;
MemoryContext oldcontext = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
@@ -819,6 +942,23 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option*/
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->safecstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->safecstate->saved_tuples = 0;
+ cstate->safecstate->replayed_tuples = 0;
+ cstate->safecstate->errors = 0;
+ cstate->safecstate->replay_is_active = false;
+ cstate->safecstate->begin_subtransaction = true;
+ cstate->safecstate->processed_remaining_tuples = false;
+
+ cstate->safecstate->oldowner = oldowner;
+ cstate->safecstate->oldcontext = oldcontext;
+ cstate->safecstate->insertMethod = insertMethod;
+ }
+
for (;;)
{
TupleTableSlot *myslot;
@@ -855,9 +995,25 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ /*
+ * If option IGNORE_ERRORS is enabled, COPY skips rows with errors.
+ * NextCopyFrom() directly store the values/nulls array in the slot.
+ */
+ if (cstate->safecstate)
+ {
+ bool valid_row = safeNextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ /* Cannot continue or break in PG_TRY in safeNextCopyFrom() */
+ if (cstate->safecstate->skip_row)
+ continue;
+ if (!valid_row)
+ break;
+ }
+ else
+ {
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index df5ceea910..3bb7235b34 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -800,7 +800,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3456,6 +3456,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeInteger(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17814,6 +17818,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18393,6 +18398,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index e572f585ef..feaf18b043 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2742,7 +2742,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..2b696f99bc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3df1c5a97c..d9d3af1fb4 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,8 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
+
/*
* Represents the different source cases we need to worry about at
@@ -49,6 +51,24 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/* Struct that holding fields for ignore_errors option. */
+typedef struct SafeCopyFromState
+{
+#define REPLAY_BUFFER_SIZE 1000
+ HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates tuples for replaying it after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active;
+ bool begin_subtransaction;
+ bool processed_remaining_tuples; /* for case of replaying last tuples */
+ bool skip_row;
+
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+ CopyInsertMethod insertMethod;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -71,6 +91,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *safecstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..ab1f059a02 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,129 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: FIND 5 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+-- CIM_SINGLE case
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: FIND 5 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err_view, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 3"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column m: "b"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: FIND 5 ERRORS
+SELECT * FROM check_ign_err_view;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: FIND 5 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 2 2 2"
+WARNING: COPY check_ign_err, line 3: "3 3"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column m: "b"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: FIND 5 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+---+---
+ 1 | 1 | 1
+ 7 | 7 | 7
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..7eee78bccd 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,116 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int, k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE case
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err_view;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 1 1
+2 2 2 2
+3 3
+a 4 4
+5 b b
+
+7 7 7
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.25.1
On 2022-08-15 22:23, Damir Belyalov wrote:
I expected I could see 9999 rows after COPY, but just saw 999 rows.
Also I implemented your case and it worked correctly.
Thanks for the new patch!
Here are some comments on it.
+ if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE) + { + valid_row = NextCopyFrom(cstate, econtext, values, nulls); + if (valid_row) + { + /* Fill replay_buffer in oldcontext*/ + MemoryContextSwitchTo(safecstate->oldcontext); + safecstate->replay_buffer[safecstate->saved_tuples++] = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);+ /* Buffer was filled, commit subtransaction and prepare to replay */ + ReleaseCurrentSubTransaction();
What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before
this commit.
As a test, I rewrote this ReleaseCurrentSubTransaction() to
RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of
data, but same data were loaded.
+#define REPLAY_BUFFER_SIZE 1000
I feel it might be better to have it as a parameter rather than fixed at
1000.
+/* + * Analog of NextCopyFrom() but ignore rows with errors while copying. + */ +static bool +safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would
it be natural to put them in the same file?
188 + safecstate->errors++;
189 + if (safecstate->errors <= 100)
190 + ereport(WARNING,
191 + (errcode(errdata->sqlerrcode),
192 + errmsg("%s", errdata->context)));
It would be better to state in the manual that errors exceeding 100 are
not displayed.
Or, it might be acceptable to output all errors that exceed 100.
+typedef struct SafeCopyFromState +{ +#define REPLAY_BUFFER_SIZE 1000 + HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates tuples for replaying it after an error */ + int saved_tuples; /* # of tuples in replay_buffer */ + int replayed_tuples; /* # of tuples was replayed from buffer */ + int errors; /* total # of errors */ + bool replay_is_active; + bool begin_subtransaction; + bool processed_remaining_tuples; /* for case of replaying last tuples */ + bool skip_row;
It would be helpful to add comments about skip_row, etc.
```
$ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch
../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces.
Datum *values, bool *nulls);
warning: 1 line adds whitespace errors.
```
There was a warning when applying the patch.
```
=# copy test from '/tmp/10000.data' with (ignore_errors);
WARNING: FIND 0 ERRORS
COPY 1003
```
When there were no errors, this WARNING seems not necessary.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
+ /* Buffer was filled, commit subtransaction and prepare
to replay */
+ ReleaseCurrentSubTransaction();
What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before
this commit.
All tuples are collected in replay_buffer, which is created in
''oldcontext'' of CopyFrom() (not context of a subtransaction). That's why
it didn't clean up when you used RollbackAndReleaseCurrentSubTransaction().
Subtransactions are needed for better safety. There is no error when
copying from a file to the replay_buffer, but an error may occur at the
next stage - when adding a tuple to the table. Also there may be other
errors that are not obvious at first glance.
I feel it might be better to have it as a parameter rather than fixed at
1000.
Yes, I thought about it too. So I created another version with the GUC
parameter - replay_buffer_size. Attached it. But I think users won't need
to change replay_buffer_size.
Also replay_buffer does the same thing as MultiInsert buffer does and
MultiInsert buffer defined by const = 1000.
NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would
it be natural to put them in the same file?
Sure, corrected it.
188 + safecstate->errors++;
189 + if (safecstate->errors <= 100)
190 + ereport(WARNING,
191 + (errcode(errdata->sqlerrcode),
192 + errmsg("%s", errdata->context)));It would be better to state in the manual that errors exceeding 100 are
not displayed.
Or, it might be acceptable to output all errors that exceed 100.
It'll be too complicated to create a new parameter just for showing the
given number of errors. I think 100 is an optimal size.
+typedef struct SafeCopyFromState +{ +#define REPLAY_BUFFER_SIZE 1000 + HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates tuples for replaying it after an error */ + int saved_tuples; /* # of tuples in replay_buffer */ + int replayed_tuples; /* # of tuples was replayed from buffer */ + int errors; /* total # of errors */ + bool replay_is_active; + bool begin_subtransaction; + bool processed_remaining_tuples; /* for case of replaying last tuples */ + bool skip_row;It would be helpful to add comments about skip_row, etc.
Corrected it.
WARNING: FIND 0 ERRORS
When there were no errors, this WARNING seems not necessary.
Corrected it.
Add to this patch processing other errors and constraints and tests for
them.
I had to create another function safeExecConstraints() only for processing
constraints, because ExecConstraints() is after NextCopyFrom() and is not
in PG_TRY. This thing a little bit complicated the code.
Maybe it is a good approach to create a new function SafeCopyFrom() and do
all ''safe copying'' in PG_TRY, but it will almost duplicate the CopyFrom()
code.
Or maybe create a function only for loop for(;;). But we have the same
thing with duplicating code and passing a lot of variables (which are
created at the beginning of CopyFrom()) to this function.
Show quoted text
Attachments:
0004-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0004-COPY_IGNORE_ERRORS.patchDownload
From 09befdad45a6b1ae70d6c5abc90d1c2296e56ee1 Mon Sep 17 00:00:00 2001
From: Damir Belyalov <dam.bel07@gmail.com>
Date: Fri, 15 Oct 2021 11:55:18 +0300
Subject: [PATCH] COPY_IGNORE_ERRORS with GUC for replay_buffer size
---
doc/src/sgml/config.sgml | 17 ++
doc/src/sgml/ref/copy.sgml | 19 ++
src/backend/commands/copy.c | 8 +
src/backend/commands/copyfrom.c | 114 +++++++++++-
src/backend/commands/copyfromparse.c | 169 ++++++++++++++++++
src/backend/parser/gram.y | 8 +-
src/backend/utils/misc/guc.c | 11 ++
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 6 +
src/include/commands/copyfrom_internal.h | 19 ++
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 130 ++++++++++++++
src/test/regress/sql/copy2.sql | 116 ++++++++++++
14 files changed, 617 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37fd80388c..69373b8d8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1961,6 +1961,23 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-logical-decoding-work-mem" xreflabel="replay_buffer_size">
+ <term><varname>replay_buffer_size</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>replay_buffer_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the size of buffer for COPY FROM IGNORE_ERRORS option. This buffer
+ is created when subtransaction begins and accumulates tuples until an error
+ occurs. Then it starts replaying stored tuples. the buffer size is the size
+ of the subtransaction. Therefore, on large tables, in order to avoid the
+ error of the maximum number of subtransactions, it should be increased.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
<term><varname>max_stack_depth</varname> (<type>integer</type>)
<indexterm>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8aae711b3b..7ff6f6dea7 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,24 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drop rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows that result in constraint violations, rows containing columns where
+ the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ The option is implemented with subtransactions and a buffer created in
+ them, which accumulates tuples until an error occures.
+ The size of buffer is the size of subtransaction block.
+ It is a GUC parameter "replay_buffer_size" and equals 1000 by default.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3ac731803b..fead1aba46 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -402,6 +402,7 @@ ProcessCopyOptions(ParseState *pstate,
{
bool format_specified = false;
bool freeze_specified = false;
+ bool ignore_errors_specified = false;
bool header_specified = false;
ListCell *option;
@@ -442,6 +443,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..7e997d15c6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -73,6 +73,11 @@
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/*
+ * GUC parameters
+ */
+int replay_buffer_size;
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -100,12 +105,13 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/* non-export function prototypes */
static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static void safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo, TupleTableSlot *myslot, EState *estate);
+
/*
* error context callback for COPY FROM
*
@@ -521,6 +527,61 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Ignore constraints if IGNORE_ERRORS is enabled
+ */
+static void
+safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo, TupleTableSlot *myslot, EState *estate)
+{
+ SafeCopyFromState *safecstate = cstate->safecstate;
+
+ safecstate->skip_row = false;
+
+ PG_TRY();
+ ExecConstraints(resultRelInfo, myslot, estate);
+ PG_CATCH();
+ {
+ ErrorData *errdata;
+ MemoryContext cxt;
+
+ cxt = MemoryContextSwitchTo(safecstate->oldcontext);
+ errdata = CopyErrorData();
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore Constraint Violation */
+ case ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION:
+ case ERRCODE_RESTRICT_VIOLATION:
+ case ERRCODE_NOT_NULL_VIOLATION:
+ case ERRCODE_FOREIGN_KEY_VIOLATION:
+ case ERRCODE_UNIQUE_VIOLATION:
+ case ERRCODE_CHECK_VIOLATION:
+ case ERRCODE_EXCLUSION_VIOLATION:
+ safecstate->errors++;
+ if (cstate->safecstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+
+ safecstate->begin_subtransaction = true;
+ safecstate->skip_row = true;
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(cxt);
+ }
+ PG_END_TRY();
+}
+
/*
* Copy FROM file to relation.
*/
@@ -535,6 +596,7 @@ CopyFrom(CopyFromState cstate)
ExprContext *econtext;
TupleTableSlot *singleslot = NULL;
MemoryContext oldcontext = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
@@ -819,9 +881,30 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->safecstate = palloc(sizeof(SafeCopyFromState));
+
+ /* Create replay_buffer in oldcontext */
+ cstate->safecstate->replay_buffer = (HeapTuple *) palloc(replay_buffer_size * sizeof(HeapTuple));
+
+ cstate->safecstate->saved_tuples = 0;
+ cstate->safecstate->replayed_tuples = 0;
+ cstate->safecstate->errors = 0;
+ cstate->safecstate->replay_is_active = false;
+ cstate->safecstate->begin_subtransaction = true;
+ cstate->safecstate->processed_remaining_tuples = false;
+
+ cstate->safecstate->oldowner = oldowner;
+ cstate->safecstate->oldcontext = oldcontext;
+ cstate->safecstate->insertMethod = insertMethod;
+ }
+
for (;;)
{
TupleTableSlot *myslot;
+ bool valid_row;
bool skip_tuple;
CHECK_FOR_INTERRUPTS();
@@ -855,8 +938,21 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ /*
+ * NextCopyFrom() directly store the values/nulls array in the slot.
+ * safeNextCopyFrom() ignores rows with errors if IGNORE_ERRORS is enabled.
+ */
+ if (!cstate->safecstate)
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+ else
+ {
+ valid_row = safeNextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ if (cstate->safecstate->skip_row)
+ continue;
+ }
+
+ if (!valid_row)
break;
ExecStoreVirtualTuple(myslot);
@@ -1035,7 +1131,17 @@ CopyFrom(CopyFromState cstate)
*/
if (resultRelInfo->ri_FdwRoutine == NULL &&
resultRelInfo->ri_RelationDesc->rd_att->constr)
- ExecConstraints(resultRelInfo, myslot, estate);
+ {
+ if (cstate->opts.ignore_errors)
+ {
+ safeExecConstraints(cstate, resultRelInfo, myslot, estate);
+
+ if (cstate->safecstate->skip_row)
+ continue;
+ }
+ else
+ ExecConstraints(resultRelInfo, myslot, estate);
+ }
/*
* Also check the tuple against the partition constraint, if
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 57813b3458..1aae27d80d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1026,6 +1026,175 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
return true;
}
+/*
+ * Analog of NextCopyFrom() but skips rows with errors while copying.
+ */
+bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ SafeCopyFromState *safecstate = cstate->safecstate;
+ bool valid_row = true;
+
+ safecstate->skip_row = false;
+
+ PG_TRY();
+ {
+ if (!safecstate->replay_is_active)
+ {
+ if (safecstate->begin_subtransaction)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = safecstate->oldowner;
+
+ safecstate->begin_subtransaction = false;
+ }
+
+ if (safecstate->saved_tuples < replay_buffer_size)
+ {
+ valid_row = NextCopyFrom(cstate, econtext, values, nulls);
+ if (valid_row)
+ {
+ /* Fill replay_buffer in CopyFrom() oldcontext */
+ MemoryContext cxt = MemoryContextSwitchTo(safecstate->oldcontext);
+
+ safecstate->replay_buffer[safecstate->saved_tuples++] = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);
+ MemoryContextSwitchTo(cxt);
+
+ safecstate->skip_row = true;
+ }
+ else if (!safecstate->processed_remaining_tuples)
+ {
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ if (safecstate->replayed_tuples < safecstate->saved_tuples)
+ {
+ /* Prepare to replay remaining tuples if they exist */
+ safecstate->replay_is_active = true;
+ safecstate->processed_remaining_tuples = true;
+ safecstate->skip_row = true;
+ return true;
+ }
+ }
+ }
+ else
+ {
+ /* Buffer was filled, commit subtransaction and prepare to replay */
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ safecstate->replay_is_active = true;
+ safecstate->begin_subtransaction = true;
+ safecstate->skip_row = true;
+ }
+ }
+ else
+ {
+ if (safecstate->replayed_tuples < safecstate->saved_tuples)
+ /* Replaying tuple */
+ heap_deform_tuple(safecstate->replay_buffer[safecstate->replayed_tuples++], RelationGetDescr(cstate->rel), values, nulls);
+ else
+ {
+ /* Clean up replay_buffer */
+ MemSet(safecstate->replay_buffer, 0, replay_buffer_size * sizeof(HeapTuple));
+ safecstate->saved_tuples = safecstate->replayed_tuples = 0;
+
+ safecstate->replay_is_active = false;
+ safecstate->skip_row = true;
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ ErrorData *errdata;
+ MemoryContext cxt;
+
+ cxt = MemoryContextSwitchTo(safecstate->oldcontext);
+ errdata = CopyErrorData();
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = safecstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore malformed data */
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ safecstate->errors++;
+ if (safecstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+
+ safecstate->begin_subtransaction = true;
+ safecstate->skip_row = true;
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(cxt);
+ }
+ PG_END_TRY();
+
+ if (!valid_row)
+ {
+ if (safecstate->errors == 0)
+ ereport(NOTICE,
+ errmsg("FIND %d ERRORS", safecstate->errors));
+ else if (safecstate->errors == 1)
+ ereport(WARNING,
+ errmsg("FIND %d ERROR", safecstate->errors));
+ else
+ ereport(WARNING,
+ errmsg("FIND %d ERRORS", safecstate->errors));
+
+ return false;
+ }
+
+ return true;
+}
+
/*
* Read the next input line and stash it in line_buf.
*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index df5ceea910..3bb7235b34 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -800,7 +800,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3456,6 +3456,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeInteger(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17814,6 +17818,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18393,6 +18398,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0328029d43..54209a4a3c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -49,6 +49,7 @@
#include "catalog/pg_parameter_acl.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "commands/prepare.h"
#include "commands/tablespace.h"
#include "commands/trigger.h"
@@ -2527,6 +2528,16 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"replay_buffer_size", PGC_USERSET, RESOURCES_MEM,
+ gettext_noop("Sets the size of replay buffer for COPY FROM IGNORE_ERRORS option"),
+ NULL
+ },
+ &replay_buffer_size,
+ 1000, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
/*
* We use the hopefully-safely-small value of 100kB as the compiled-in
* default for max_stack_depth. InitializeGUCOptions will increase it if
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b4bc06e5f5..f4e777a0a3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -155,6 +155,8 @@
# mmap
# (change requires restart)
#min_dynamic_shared_memory = 0MB # (change requires restart)
+#replay_buffer_size = 1000 # min 1
+ # (change requires restart)
# - Disk -
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index e572f585ef..feaf18b043 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2742,7 +2742,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..fc9f559efe 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,6 +19,9 @@
#include "parser/parse_node.h"
#include "tcop/dest.h"
+/* User-settable GUC parameters */
+extern PGDLLIMPORT int replay_buffer_size;
+
/*
* Represents whether a header line should be present, and whether it must
* match the actual names (which implies "true").
@@ -42,6 +45,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
@@ -78,6 +82,8 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Datum *values, bool *nulls);
+extern bool safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3df1c5a97c..4227a7babd 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -49,6 +50,23 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+ /* Struct that holding fields for COPY FROM IGNORE_ERRORS option. */
+typedef struct SafeCopyFromState
+{
+ HeapTuple *replay_buffer; /* accumulates tuples for replaying them after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active; /* become active after an error */
+ bool begin_subtransaction; /* if it's true, we can start subtransaction */
+ bool processed_remaining_tuples; /* for case of replaying last tuples */
+ bool skip_row; /* if it's true, we should skip this row */
+
+ MemoryContext oldcontext; /* create repay_buffer in CopyFrom() context */
+ ResourceOwner oldowner; /* CopyFrom() resource owner */
+ CopyInsertMethod insertMethod;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -71,6 +89,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *safecstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..093c7958be 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,136 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int check (n < 8), m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column n: "5 {5} 5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: COPY check_ign_err, line 8, column n: "8 {8} 8"
+WARNING: FIND 7 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[]);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column n: "5 {5} 5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: COPY check_ign_err, line 8, column n: "8 {8} 8"
+WARNING: FIND 7 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column n: "5 {5} 5555555555"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: COPY check_ign_err_view, line 8, column n: "8 {8} 8"
+WARNING: FIND 7 ERRORS
+SELECT * FROM check_ign_err_view;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column n: "5 {5} 5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: COPY check_ign_err, line 8, column n: "8 {8} 8"
+WARNING: FIND 7 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int check (n < 8), m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column n: "5 {5} 5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: COPY check_ign_err, line 8, column n: "8 {8} 8"
+WARNING: FIND 7 ERRORS
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..c91122aa1e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int check (n < 8), m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[]);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err_view;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int check (n < 8), m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (8);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.25.1
Hi,
I was looking at 0004-COPY_IGNORE_ERRORS.patch
+ * Ignore constraints if IGNORE_ERRORS is enabled
+ */
+static void
+safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo,
TupleTableSlot *myslot, EState *estate)
I think the existing ExecConstraints() can be expanded by
checking cstate->opts.ignore_errors so that it can selectively
ignore Constraint Violations.
This way you don't need safeExecConstraints().
Cheers
Import Notes
Resolved by subject fallback
On 2022-08-25 01:54, Damir Belyalov wrote:
+ /* Buffer was filled, commit subtransaction and
prepare to replay */
+ ReleaseCurrentSubTransaction();
What is actually being committed by this
ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled
before
this commit.All tuples are collected in replay_buffer, which is created in
''oldcontext'' of CopyFrom() (not context of a subtransaction). That's
why it didn't clean up when you used
RollbackAndReleaseCurrentSubTransaction().
Subtransactions are needed for better safety. There is no error when
copying from a file to the replay_buffer, but an error may occur at
the next stage - when adding a tuple to the table. Also there may be
other errors that are not obvious at first glance.
Thanks for the explanation and updating patch.
I now understand that the data being COPYed are not the target of
COMMIT.
Although in past discussions[1]/messages/by-id/1197677930.1536.18.camel@dell.linuxdev.us.dell.com it seems the data to be COPYed were also
subject to COMMIT, but I understand this patch has adopted another
design.
350 + /* Buffer was filled, commit subtransaction and
prepare to replay */
351 + ReleaseCurrentSubTransaction();
352 + CurrentResourceOwner = safecstate->oldowner;
353 +
354 + safecstate->replay_is_active = true;
BTW in v4 patch, data are loaded into the buffer one by one, and when
the buffer fills up, the data in the buffer are 'replayed' also one by
one, right?
Wouldn't this have more overhead than a normal COPY?
As a test, I COPYed slightly larger data with and without ignore_errors
option.
There might be other reasons, but I found a performance difference.
```
=# copy test from '/tmp/10000000.data' ;
COPY 10000000
Time: 6001.325 ms (00:06.001)
=# copy test from '/tmp/10000000.data' with (ignore_errors);
NOTICE: FIND 0 ERRORS
COPY 10000000
Time: 7711.555 ms (00:07.712)
```
I feel it might be better to have it as a parameter rather than
fixed at
1000.Yes, I thought about it too. So I created another version with the GUC
parameter - replay_buffer_size. Attached it. But I think users won't
need to change replay_buffer_size.
Also replay_buffer does the same thing as MultiInsert buffer does and
MultiInsert buffer defined by const = 1000.
Yeah, when the data being COPYed are not the target of COMMIT, I also
think users won't neet to change it.
NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented,
would
it be natural to put them in the same file?Sure, corrected it.
Thanks.
188 + safecstate->errors++;
189 + if (safecstate->errors <= 100)
190 + ereport(WARNING,
191 + (errcode(errdata->sqlerrcode),
192 + errmsg("%s", errdata->context)));It would be better to state in the manual that errors exceeding 100
are
not displayed.
Or, it might be acceptable to output all errors that exceed 100.It'll be too complicated to create a new parameter just for showing
the given number of errors. I think 100 is an optimal size.
Yeah, I may have misled you, but I also don't think this needs a new
parameter.
I just thought 'either' of the following would be better:
- Put in the documentation that the warnings will not be output for more
than 101 cases.
- Output all the warnings.
It would be helpful to add comments about skip_row, etc.
Corrected it.
Thanks.
WARNING: FIND 0 ERRORS
When there were no errors, this WARNING seems not necessary.Corrected it.
Thanks.
I applied v4 patch and when canceled the COPY, there was a case I found
myself left in a transaction.
Should this situation be prevented from occurring?
```
=# copy test from '/tmp/10000000.data' with (ignore_errors );
^CCancel request sent
ERROR: canceling statement due to user request
=# truncate test;
ERROR: current transaction is aborted, commands ignored until end of
transaction block
```
[1]: /messages/by-id/1197677930.1536.18.camel@dell.linuxdev.us.dell.com
/messages/by-id/1197677930.1536.18.camel@dell.linuxdev.us.dell.com
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Thank you for reviewing.
In the previous patch there was an error when processing constraints. The
patch was fixed, but the code grew up and became more complicated
(0005-COPY_IGNORE_ERRORS). I also simplified the logic of
safeNextCopyFrom().
You asked why we need subtransactions, so the answer is in this patch. When
processing a row that does not satisfy constraints or INSTEAD OF triggers,
it is necessary to rollback the subtransaction and return the table to its
original state.
Cause of complexity, I had to abandon the constraints, triggers processing
in and handle only errors that occur when reading the file. Attaching
simplified patch (0006-COPY_IGNORE_ERRORS).
Checked out these patches on all COPY regress tests and it worked correctly.
BTW in v4 patch, data are loaded into the buffer one by one, and when
the buffer fills up, the data in the buffer are 'replayed' also one by
one, right?
Wouldn't this have more overhead than a normal COPY?
The data is loaded into the buffer one by one, but in the "replay mode"
ignore_errors works as standard COPY. Tuples add to the slot and depending
on the type of the slot (CIM_SINGLE or CIM_MULTI) tuples are replayed in
the corresponding case.
For the 0006 patch you can imagine that we divide the loop for(;;) in 2
parts. The first part is adding tuples to the buffer and the second part is
inserting tuples to the table. These parts don't intersect with each other
and are completed sequentially.
The main idea of replay_buffer is that it is needed for the CIM_SINGLE
case. You can implement the CIM_SINGLE case and see that tuples before an
error occurring don't add to the table. Logic of the 0005 patch is similar
but with some differences.
As a test, I COPYed slightly larger data with and without ignore_errors
option.
There might be other reasons, but I found a performance difference.
Tried to reduce performance difference with cleaning up replay_buffer with
resetting the new context for replay_buffer - replay_cxt.
```
Before:
Without ignore_errors:
COPY 10000000
Time: 15538,579 ms (00:15,539)
With ignore_errors:
COPY 10000000
Time: 21289,121 ms (00:21,289)
After:
Without ignore_errors:
COPY 10000000
Time: 15318,922 ms (00:15,319)
With ignore_errors:
COPY 10000000
Time: 19868,175 ms (00:19,868)
```
- Put in the documentation that the warnings will not be output for more
than 101 cases.
Yeah, I point it out in the doc.
I applied v4 patch and when canceled the COPY, there was a case I found
myself left in a transaction.
Should this situation be prevented from occurring?```
=# copy test from '/tmp/10000000.data' with (ignore_errors );^CCancel request sent
ERROR: canceling statement due to user request=# truncate test;
ERROR: current transaction is aborted, commands ignored until end of
transaction block
```
Tried to implement your error and could not. The result was the same as
COPY FROM implements.
Attachments:
0005-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0005-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..c99adabcc9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows that result in constraint violations, rows containing columns where
+ the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 49924e476a..f41b25f26a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e8bb168aea..6474d47d29 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,12 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+static bool safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo, TupleTableSlot *myslot, EState *estate);
+
+static void addToReplayBuffer(CopyFromState cstate, TupleTableSlot *myslot);
+
/*
* error context callback for COPY FROM
*
@@ -521,6 +527,254 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Analog of NextCopyFrom() but ignores rows with errors while copying.
+ */
+bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+
+ sfcstate->tuple_is_valid = false;
+
+ PG_TRY();
+ {
+ if (sfcstate->begin_subxact)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = false;
+ }
+
+ if (!sfcstate->replay_is_active)
+ {
+ if (sfcstate->saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+ bool valid_row = NextCopyFrom(cstate, econtext, values, nulls);
+
+ CurrentMemoryContext = cxt;
+
+ sfcstate->tuple_is_valid = true;
+
+ if (!valid_row && sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ /* Prepare for replaying remaining tuples if they exist */
+ sfcstate->replay_is_active = true;
+
+ /* If there are insteadof triggers we should rollback subtransaction */
+ if (sfcstate->has_instead_insert_row_trig)
+ {
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = true;
+ }
+ }
+ else if (!valid_row)
+ {
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ if (sfcstate->errors == 0)
+ ereport(NOTICE,
+ errmsg("%d errors", sfcstate->errors));
+ else if (sfcstate->errors == 1)
+ ereport(WARNING,
+ errmsg("%d error", sfcstate->errors));
+ else
+ ereport(WARNING,
+ errmsg("%d errors", sfcstate->errors));
+
+ return false;
+ }
+ }
+ else
+ {
+ /* Buffer was filled, prepare for replaying */
+ sfcstate->replay_is_active = true;
+
+ /* If there are insteadof triggers we should rollback subtransaction */
+ if (sfcstate->has_instead_insert_row_trig)
+ {
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = true;
+ }
+ }
+ }
+
+ if (sfcstate->replay_is_active)
+ {
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ /* Replaying the tuple */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+
+ heap_deform_tuple(sfcstate->replay_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel), values, nulls);
+ MemoryContextSwitchTo(cxt);
+
+ sfcstate->tuple_is_valid = true;
+ }
+ else
+ {
+ /* Clean up replay_buffer */
+ MemoryContextReset(sfcstate->replay_cxt);
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = true;
+ sfcstate->replay_is_active = false;
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+
+ sfcstate->begin_subxact = true;
+
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ return true;
+}
+
+/*
+ * Analog of ExecConstraints(), but ignores rows in constraint violations.
+ */
+bool
+safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo, TupleTableSlot *myslot, EState *estate)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+
+ PG_TRY();
+ ExecConstraints(resultRelInfo, myslot, estate);
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore constraint violations */
+ case ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION:
+ case ERRCODE_RESTRICT_VIOLATION:
+ case ERRCODE_NOT_NULL_VIOLATION:
+ case ERRCODE_FOREIGN_KEY_VIOLATION:
+ case ERRCODE_UNIQUE_VIOLATION:
+ case ERRCODE_CHECK_VIOLATION:
+ case ERRCODE_EXCLUSION_VIOLATION:
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s %s", errdata->message, errdata->detail)));
+
+ sfcstate->begin_subxact = true;
+ sfcstate->tuple_is_valid = false;
+
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ if (!sfcstate->tuple_is_valid)
+ return false;
+
+ return true;
+}
+
+void
+addToReplayBuffer(CopyFromState cstate, TupleTableSlot *myslot)
+{
+ MemoryContext cxt = MemoryContextSwitchTo(cstate->sfcstate->replay_cxt);
+ HeapTuple saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+
+ cstate->sfcstate->replay_buffer[cstate->sfcstate->saved_tuples++] = saved_tuple;
+ MemoryContextSwitchTo(cxt);
+}
+
/*
* Copy FROM file to relation.
*/
@@ -535,6 +789,7 @@ CopyFrom(CopyFromState cstate)
ExprContext *econtext;
TupleTableSlot *singleslot = NULL;
MemoryContext oldcontext = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
@@ -819,6 +1074,27 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->replay_cxt = AllocSetContextCreate(oldcontext,
+ "Replay context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->errors = 0;
+ cstate->sfcstate->replay_is_active = false;
+ cstate->sfcstate->begin_subxact = true;
+ cstate->sfcstate->oldowner = oldowner;
+ cstate->sfcstate->oldcontext = oldcontext;
+ if (has_instead_insert_row_trig)
+ cstate->sfcstate->has_instead_insert_row_trig = true;
+ }
+
for (;;)
{
TupleTableSlot *myslot;
@@ -855,19 +1131,25 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ if (cstate->sfcstate)
+ {
+ /* If option IGNORE_ERRORS is enabled, COPY skips rows with errors */
+ if (!safeNextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ if (!cstate->sfcstate->tuple_is_valid)
+ continue;
+ }
+ else
+ {
+ /* Directly store the values/nulls array in the slot */
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ }
ExecStoreVirtualTuple(myslot);
- /*
- * Constraints and where clause might reference the tableoid column,
- * so (re-)initialize tts_tableOid before evaluating them.
- */
myslot->tts_tableOid = RelationGetRelid(target_resultRelInfo->ri_RelationDesc);
- /* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);
if (cstate->whereClause)
@@ -1020,6 +1302,13 @@ CopyFrom(CopyFromState cstate)
if (has_instead_insert_row_trig)
{
ExecIRInsertTriggers(estate, resultRelInfo, myslot);
+
+ /* Add tuple to replay_buffer if IGNORE_ERRORS is enabled */
+ if (cstate->sfcstate && !cstate->sfcstate->replay_is_active)
+ {
+ addToReplayBuffer(cstate, myslot);
+ continue;
+ }
}
else
{
@@ -1035,7 +1324,15 @@ CopyFrom(CopyFromState cstate)
*/
if (resultRelInfo->ri_FdwRoutine == NULL &&
resultRelInfo->ri_RelationDesc->rd_att->constr)
- ExecConstraints(resultRelInfo, myslot, estate);
+ {
+ if (cstate->sfcstate)
+ {
+ if (!safeExecConstraints(cstate, resultRelInfo, myslot, estate))
+ continue;
+ }
+ else
+ ExecConstraints(resultRelInfo, myslot, estate);
+ }
/*
* Also check the tuple against the partition constraint, if
@@ -1047,6 +1344,13 @@ CopyFrom(CopyFromState cstate)
(proute == NULL || has_before_insert_row_trig))
ExecPartitionCheck(resultRelInfo, myslot, estate, true);
+ /* Add tuple to replay_buffer if IGNORE_ERRORS is enabled */
+ if (cstate->sfcstate && !cstate->sfcstate->replay_is_active)
+ {
+ addToReplayBuffer(cstate, myslot);
+ continue;
+ }
+
/* Store the slot in the multi-insert buffer, when enabled. */
if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b5ab9d9c9a..b49954c0aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -808,7 +808,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3477,6 +3477,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17778,6 +17782,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18357,6 +18362,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..fe590ff7a8 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2748,7 +2748,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..2b696f99bc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..5615fa55ef 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -49,6 +50,24 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/* Struct that holding fields for ignore_errors option. */
+typedef struct SafeCopyFromState
+{
+#define REPLAY_BUFFER_SIZE 10
+ HeapTuple *replay_buffer; /* accumulates tuples for replaying it after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active; /* if true we replay tuples from buffer */
+ bool begin_subxact; /* if true we can begin subtransaction */
+ bool tuple_is_valid;
+ bool has_instead_insert_row_trig;
+
+ MemoryContext replay_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -71,6 +90,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..d74575fd40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,134 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int check (n != 6), m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: new row for relation "check_ign_err" violates check constraint "check_ign_err_n_check" Failing row contains (6, {6}, 6).
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: new row for relation "check_ign_err" violates check constraint "check_ign_err_n_check" Failing row contains (6, {6}, 6).
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: 5 errors
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 6 | {6} | 6
+ 8 | {8} | 8
+(3 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 5 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int check (n != 6), m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: new row for relation "check_ign_err_part2" violates check constraint "check_ign_err_n_check" Failing row contains (6, {6}, 6).
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..8d29ceba26 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int check (n != 6), m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+6 {6} 6
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+6 {6} 6
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+6 {6} 6
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+6 {6} 6
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int check (n != 6), m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+ FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+6 {6} 6
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
0006-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0006-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..22c992e6f6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows containing columns where the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 49924e476a..f41b25f26a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e8bb168aea..39f1dca084 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -535,6 +535,7 @@ CopyFrom(CopyFromState cstate)
ExprContext *econtext;
TupleTableSlot *singleslot = NULL;
MemoryContext oldcontext = CurrentMemoryContext;
+ ResourceOwner oldowner = CurrentResourceOwner;
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
@@ -819,6 +820,26 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option*/
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->replay_cxt = AllocSetContextCreate(oldcontext,
+ "Replay_context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->errors = 0;
+ cstate->sfcstate->replay_is_active = false;
+ cstate->sfcstate->begin_subxact = true;
+
+ cstate->sfcstate->oldowner = oldowner;
+ cstate->sfcstate->oldcontext = oldcontext;
+ }
+
for (;;)
{
TupleTableSlot *myslot;
@@ -855,9 +876,18 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ if (cstate->sfcstate)
+ {
+ /* If option IGNORE_ERRORS is enabled, COPY skips rows with errors */
+ if (!safeNextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
+ if (!cstate->sfcstate->replay_is_active)
+ continue;
+ }
+ else
+ /* Directly store the values/nulls array in the slot */
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
ExecStoreVirtualTuple(myslot);
@@ -1035,7 +1065,21 @@ CopyFrom(CopyFromState cstate)
*/
if (resultRelInfo->ri_FdwRoutine == NULL &&
resultRelInfo->ri_RelationDesc->rd_att->constr)
- ExecConstraints(resultRelInfo, myslot, estate);
+ {
+ if (cstate->opts.ignore_errors)
+ {
+ PG_TRY();
+ ExecConstraints(resultRelInfo, myslot, estate);
+ PG_CATCH();
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = cstate->sfcstate->oldowner;
+
+ PG_RE_THROW();
+ PG_END_TRY();
+ }
+ else
+ ExecConstraints(resultRelInfo, myslot, estate);
+ }
/*
* Also check the tuple against the partition constraint, if
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cf3e865cf..6b92e781e3 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -839,6 +839,169 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return true;
}
+/*
+ * Analog of NextCopyFrom() but ignores rows with errors while copying.
+ */
+bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+
+ PG_TRY();
+ {
+ if (sfcstate->begin_subxact)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = false;
+ }
+
+ if (!sfcstate->replay_is_active)
+ {
+ if (sfcstate->saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+ bool valid_row = NextCopyFrom(cstate, econtext, values, nulls);
+
+ CurrentMemoryContext = cxt;
+
+ if (valid_row)
+ {
+ /* Filling replay_buffer in Replay_context */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+ HeapTuple saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);
+
+ sfcstate->replay_buffer[sfcstate->saved_tuples++] = saved_tuple;
+ MemoryContextSwitchTo(cxt);
+ }
+ else if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ /* Prepare for replaying remaining tuples if they exist */
+ sfcstate->replay_is_active = true;
+ else
+ {
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ if (sfcstate->errors == 0)
+ ereport(NOTICE,
+ errmsg("%d errors", sfcstate->errors));
+ else if (sfcstate->errors == 1)
+ ereport(WARNING,
+ errmsg("%d error", sfcstate->errors));
+ else
+ ereport(WARNING,
+ errmsg("%d errors", sfcstate->errors));
+
+ return false;
+ }
+ }
+ else
+ /* Buffer was filled, prepare for replaying */
+ sfcstate->replay_is_active = true;
+ }
+
+ if (sfcstate->replay_is_active)
+ {
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ /* Replaying the tuple */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+
+ heap_deform_tuple(sfcstate->replay_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel), values, nulls);
+ MemoryContextSwitchTo(cxt);
+ }
+ else
+ {
+ /* Clean up replay_buffer */
+ MemoryContextReset(sfcstate->replay_cxt);
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->begin_subxact = true;
+ sfcstate->replay_is_active = false;
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+
+ sfcstate->begin_subxact = true;
+
+ break;
+ default:
+ PG_RE_THROW();
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ return true;
+}
+
/*
* Read next tuple from file for COPY FROM. Return false if no more tuples.
*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b5ab9d9c9a..b49954c0aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -808,7 +808,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3477,6 +3477,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17778,6 +17782,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18357,6 +18362,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..fe590ff7a8 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2748,7 +2748,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..006e1024e1 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
@@ -76,6 +77,8 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
const char *filename,
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
+extern bool safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Datum *values, bool *nulls);
extern bool NextCopyFromRawFields(CopyFromState cstate,
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..9100d5f247 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -49,6 +50,24 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * Struct that holding fields for ignore_errors option
+ */
+typedef struct SafeCopyFromState
+{
+#define REPLAY_BUFFER_SIZE 1000
+ HeapTuple *replay_buffer; /* accumulates tuples for replaying it after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active; /* if true we replay tuples from buffer */
+ bool begin_subxact; /* if true we can begin subtransaction */
+
+ MemoryContext replay_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -71,6 +90,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..acf4917e64 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,135 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..b25b20182e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
On 2022-09-21 21:11, Damir Belyalov wrote:
Thanks for updating patch.
In the previous patch there was an error when processing constraints.
The patch was fixed, but the code grew up and became more complicated
(0005-COPY_IGNORE_ERRORS). I also simplified the logic of
safeNextCopyFrom().
You asked why we need subtransactions, so the answer is in this patch.
When processing a row that does not satisfy constraints or INSTEAD OF
triggers, it is necessary to rollback the subtransaction and return
the table to its original state.
Cause of complexity, I had to abandon the constraints, triggers
processing in and handle only errors that occur when reading the file.
Attaching simplified patch (0006-COPY_IGNORE_ERRORS).
Do you mean you stop dealing with errors concerned with constraints and
triggers and we should review 0006-COPY_IGNORE_ERRORS?
Tried to implement your error and could not. The result was the same
as COPY FROM implements.
Hmm, I applied v6 patch and when canceled COPY by sending SIGINT(ctrl +
C), I faced the same situation as below.
I tested it on CentOS 8.4.
=# COPY test FROM '/home/tori/pgsql/master/10000000.data' WITH
(IGNORE_ERRORS);
^CCancel request sent
ERROR: canceling statement due to user request
CONTEXT: COPY test, line 628000: "628000 xxx"
=# SELECT 1;
ERROR: current transaction is aborted, commands ignored until end of
transaction block
=# ABORT;
FATAL: UserAbortTransactionBlock: unexpected state STARTED
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
I did the same procedure on COPY FROM without IGNORE_ERRORS and didn't
face this situation.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Do you mean you stop dealing with errors concerned with constraints and
triggers and we should review 0006-COPY_IGNORE_ERRORS?
Yes, this patch is simpler and I think it's worth adding it first.
Hmm, I applied v6 patch and when canceled COPY by sending SIGINT(ctrl +
C), I faced the same situation as below.
I tested it on CentOS 8.4.
Thank you for pointing out this error. it really needs to be taken into
account. In the previous 0006 patch, there were 2 stages in one
subtransaction - filling the buffer and 'replay mode' (reading from the
buffer). Since only NextCopyFrom() is included in PG_TRY(), and the
ERRCODE_QUERY_CANCELED error can occur anywhere, it is impossible to catch
it and rollback the subtransaction.
I changed the 0006 patch and fixed this error and now only the 'replay
buffer filling' is made in the subtransaction.
Patch 0005 (that processed constraints) needs to be finalized, because it
requires subtransactions to rollback constraints, triggers. Therefore, it
is not possible to fix it yet. There is a decision to put for(;;) loop in
PG_TRY. It will solve the problem, but the code will be too complicated.
Attachments:
0007-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0007-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..22c992e6f6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows containing columns where the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 49924e476a..f41b25f26a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e8bb168aea..caa3375758 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,9 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool FillReplayBuffer(CopyFromState cstate, ExprContext *econtext,
+ TupleTableSlot *myslot);
+
/*
* error context callback for COPY FROM
*
@@ -521,6 +524,177 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Fills replay_buffer for safe copying, enables by IGNORE_ERRORS option.
+ */
+bool
+FillReplayBuffer(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+ bool valid_row = true;
+
+ if (!sfcstate->replay_is_active)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ while (sfcstate->saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ bool tuple_is_valid = false;
+
+ PG_TRY();
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ if (valid_row)
+ tuple_is_valid = true;
+
+ CurrentMemoryContext = cxt;
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ Assert(IsSubTransaction());
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ /* If the error can be processed, begin a new subtransaction */
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+ break;
+ default:
+ PG_RE_THROW();
+ break;
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ if (tuple_is_valid)
+ {
+ /* Filling replay_buffer in Replay_context */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+ HeapTuple saved_tuple;
+
+ saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ sfcstate->replay_buffer[sfcstate->saved_tuples++] = saved_tuple;
+
+ MemoryContextSwitchTo(cxt);
+ }
+
+ MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+ ExecClearTuple(myslot);
+
+ if (!valid_row)
+ break;
+ }
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ /* End of file or buffer was filled, prepare to replay remaining tuples from buffer */
+ sfcstate->replay_is_active = true;
+ }
+
+ if (sfcstate->replay_is_active)
+ {
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ /* Replaying the tuple */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+
+ heap_deform_tuple(sfcstate->replay_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ MemoryContextSwitchTo(cxt);
+ }
+ else
+ {
+ /* All tuples from buffer were replayed, clean it up */
+ MemoryContextReset(sfcstate->replay_cxt);
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+
+ sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ sfcstate->replay_is_active = false;
+
+ if (!valid_row)
+ {
+ /* All tuples were replayed */
+ if (sfcstate->errors == 0)
+ ereport(NOTICE,
+ errmsg("%d errors", sfcstate->errors));
+ else if (sfcstate->errors == 1)
+ ereport(WARNING,
+ errmsg("%d error", sfcstate->errors));
+ else
+ ereport(WARNING,
+ errmsg("%d errors", sfcstate->errors));
+
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
/*
* Copy FROM file to relation.
*/
@@ -855,9 +1029,19 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ if (cstate->sfcstate)
+ {
+ /* If option IGNORE_ERRORS is enabled, COPY skips rows with errors */
+ if (!FillReplayBuffer(cstate, econtext, myslot))
+ break;
+
+ if (!cstate->sfcstate->replay_is_active)
+ continue;
+ }
+ else
+ /* Directly store the values/nulls array in the slot */
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
ExecStoreVirtualTuple(myslot);
@@ -1550,6 +1734,25 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
}
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->replay_cxt = AllocSetContextCreate(oldcontext,
+ "Replay_context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->errors = 0;
+ cstate->sfcstate->replay_is_active = false;
+
+ cstate->sfcstate->oldowner = CurrentResourceOwner;
+ cstate->sfcstate->oldcontext = cstate->copycontext;
+ }
+
MemoryContextSwitchTo(oldcontext);
return cstate;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b5ab9d9c9a..b49954c0aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -808,7 +808,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3477,6 +3477,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -17778,6 +17782,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -18357,6 +18362,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..fe590ff7a8 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2748,7 +2748,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cb0096aeb6..2b696f99bc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e37c6032ae..d3f4c8d9df 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -49,6 +50,23 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * Struct that holding fields for ignore_errors option
+ */
+typedef struct SafeCopyFromState
+{
+#define REPLAY_BUFFER_SIZE 1000
+ HeapTuple *replay_buffer; /* accumulates tuples for replaying it after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active; /* if true we replay tuples from buffer */
+
+ MemoryContext replay_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -71,6 +89,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae35f03251..2af11bd359 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -201,6 +201,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..acf4917e64 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,135 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..b25b20182e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
Updated the patch due to conflicts when applying to master.
Show quoted text
Attachments:
0008-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0008-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..22c992e6f6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows containing columns where the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db4c9dbc23..d04753a4c8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a079c70152..fa169d2cf4 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -107,6 +107,9 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool FillReplayBuffer(CopyFromState cstate, ExprContext *econtext,
+ TupleTableSlot *myslot);
+
/*
* error context callback for COPY FROM
*
@@ -625,6 +628,177 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Fills replay_buffer for safe copying, enables by IGNORE_ERRORS option.
+ */
+bool
+FillReplayBuffer(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+ bool valid_row = true;
+
+ if (!sfcstate->replay_is_active)
+ {
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ while (sfcstate->saved_tuples < REPLAY_BUFFER_SIZE)
+ {
+ bool tuple_is_valid = false;
+
+ PG_TRY();
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ if (valid_row)
+ tuple_is_valid = true;
+
+ CurrentMemoryContext = cxt;
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ Assert(IsSubTransaction());
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ /* If the error can be processed, begin a new subtransaction */
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+ break;
+ default:
+ PG_RE_THROW();
+ break;
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ if (tuple_is_valid)
+ {
+ /* Filling replay_buffer in Replay_context */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+ HeapTuple saved_tuple;
+
+ saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ sfcstate->replay_buffer[sfcstate->saved_tuples++] = saved_tuple;
+
+ MemoryContextSwitchTo(cxt);
+ }
+
+ MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+ ExecClearTuple(myslot);
+
+ if (!valid_row)
+ break;
+ }
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ /* End of file or buffer was filled, prepare to replay remaining tuples from buffer */
+ sfcstate->replay_is_active = true;
+ }
+
+ if (sfcstate->replay_is_active)
+ {
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ /* Replaying the tuple */
+ MemoryContext cxt = MemoryContextSwitchTo(sfcstate->replay_cxt);
+
+ heap_deform_tuple(sfcstate->replay_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ MemoryContextSwitchTo(cxt);
+ }
+ else
+ {
+ /* All tuples from buffer were replayed, clean it up */
+ MemoryContextReset(sfcstate->replay_cxt);
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+
+ sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ sfcstate->replay_is_active = false;
+
+ if (!valid_row)
+ {
+ /* All tuples were replayed */
+ if (sfcstate->errors == 0)
+ ereport(NOTICE,
+ errmsg("%d errors", sfcstate->errors));
+ else if (sfcstate->errors == 1)
+ ereport(WARNING,
+ errmsg("%d error", sfcstate->errors));
+ else
+ ereport(WARNING,
+ errmsg("%d errors", sfcstate->errors));
+
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
/*
* Copy FROM file to relation.
*/
@@ -985,9 +1159,19 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
- break;
+ if (cstate->sfcstate)
+ {
+ /* If option IGNORE_ERRORS is enabled, COPY skips rows with errors */
+ if (!FillReplayBuffer(cstate, econtext, myslot))
+ break;
+
+ if (!cstate->sfcstate->replay_is_active)
+ continue;
+ }
+ else
+ /* Directly store the values/nulls array in the slot */
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ break;
ExecStoreVirtualTuple(myslot);
@@ -1695,6 +1879,25 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
}
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->replay_cxt = AllocSetContextCreate(oldcontext,
+ "Replay_context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->replay_buffer = MemoryContextAlloc(cstate->sfcstate->replay_cxt,
+ REPLAY_BUFFER_SIZE * sizeof(HeapTuple));
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->errors = 0;
+ cstate->sfcstate->replay_is_active = false;
+
+ cstate->sfcstate->oldowner = CurrentResourceOwner;
+ cstate->sfcstate->oldcontext = cstate->copycontext;
+ }
+
MemoryContextSwitchTo(oldcontext);
return cstate;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..b3a6c9931e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3359,6 +3359,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16756,6 +16760,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17310,6 +17315,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 584d9d5ae6..33d583a94c 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2757,7 +2757,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b77b935005..0bf9641b6e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 8d9cc5accd..3289d96872 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -52,6 +53,23 @@ typedef enum CopyInsertMethod
* ExecForeignBatchInsert only if valid */
} CopyInsertMethod;
+/*
+ * Struct that holding fields for ignore_errors option
+ */
+typedef struct SafeCopyFromState
+{
+#define REPLAY_BUFFER_SIZE 1000
+ HeapTuple *replay_buffer; /* accumulates tuples for replaying it after an error */
+ int saved_tuples; /* # of tuples in replay_buffer */
+ int replayed_tuples; /* # of tuples was replayed from buffer */
+ int errors; /* total # of errors */
+ bool replay_is_active; /* if true we replay tuples from buffer */
+
+ MemoryContext replay_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -74,6 +92,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 957ee18d84..ed25a8c86c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..acf4917e64 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,135 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: 6 errors
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..b25b20182e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
Updated the patch:
- Optimized and simplified logic of IGNORE_ERRORS
- Changed variable names to more understandable ones
- Added an analogue of MAX_BUFFERED_BYTES for safe buffer
Regards,
Damir Belyalov
Postgres Professional
Show quoted text
Attachments:
0010-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0010-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..22c992e6f6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows containing columns where the data type's input function raises an error.
+ Outputs warnings about rows with incorrect data (the number of warnings
+ is not more than 100) and the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db4c9dbc23..d04753a4c8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a079c70152..846eac022d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -107,6 +107,9 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
+ TupleTableSlot *myslot);
+
/*
* error context callback for COPY FROM
*
@@ -625,6 +628,175 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Safely reads source data, converts to a tuple and fills tuple buffer.
+ * Skips some data in the case of failed conversion if data source for
+ * a next tuple can be surely read without a danger.
+ */
+bool
+SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+ bool valid_row = true;
+
+ /* Standard COPY if IGNORE_ERRORS is disabled */
+ if (!cstate->sfcstate)
+ /* Directly stores the values/nulls array in the slot */
+ return NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ Assert(sfcstate->saved_tuples > 0);
+
+ /* Prepare to replay the tuple */
+ heap_deform_tuple(sfcstate->safe_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ return true;
+ }
+ else
+ {
+ /* All tuples from buffer were replayed, clean it up */
+ MemoryContextReset(sfcstate->safe_cxt);
+
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+ sfcstate->safeBufferBytes = 0;
+ }
+
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ while (sfcstate->saved_tuples < SAFE_BUFFER_SIZE &&
+ sfcstate->safeBufferBytes < MAX_SAFE_BUFFER_BYTES)
+ {
+ bool tuple_is_valid = true;
+
+ PG_TRY();
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+ tuple_is_valid = valid_row;
+
+ if (valid_row)
+ sfcstate->safeBufferBytes += cstate->line_buf.len;
+
+ CurrentMemoryContext = cxt;
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ tuple_is_valid = false;
+
+ Assert(IsSubTransaction());
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ /* If the error can be processed, begin a new subtransaction */
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->errors++;
+ if (sfcstate->errors <= 100)
+ ereport(WARNING,
+ (errcode(errdata->sqlerrcode),
+ errmsg("%s", errdata->context)));
+ break;
+ default:
+ MemoryContextSwitchTo(ecxt);
+
+ PG_RE_THROW();
+
+ break;
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ if (tuple_is_valid)
+ {
+ /* Add tuple to safe_buffer in Safe_context */
+ HeapTuple saved_tuple;
+
+ MemoryContextSwitchTo(sfcstate->safe_cxt);
+
+ saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ sfcstate->safe_buffer[sfcstate->saved_tuples++] = saved_tuple;
+ }
+
+ ExecClearTuple(myslot);
+
+ if (!valid_row)
+ break;
+ }
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ /* Prepare to replay the first tuple from safe_buffer */
+ if (sfcstate->saved_tuples != 0)
+ {
+ heap_deform_tuple(sfcstate->safe_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ return true;
+ }
+
+ /* End of file and nothing to replay? */
+ if (!valid_row && sfcstate->replayed_tuples == sfcstate->saved_tuples)
+ return false;
+
+ return true;
+}
+
/*
* Copy FROM file to relation.
*/
@@ -985,8 +1157,8 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ /* Standard copying with option "safe copying" enabled by IGNORE_ERRORS. */
+ if (!SafeCopying(cstate, econtext, myslot))
break;
ExecStoreVirtualTuple(myslot);
@@ -1270,6 +1442,10 @@ CopyFrom(CopyFromState cstate)
}
}
+ if (cstate->sfcstate && cstate->sfcstate->errors > 0)
+ ereport(WARNING,
+ errmsg("Errors: %d", cstate->sfcstate->errors));
+
/* Flush any remaining buffered tuples */
if (insertMethod != CIM_SINGLE)
{
@@ -1695,6 +1871,23 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
}
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->safe_cxt = AllocSetContextCreate(oldcontext,
+ "Safe_context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->safeBufferBytes = 0;
+ cstate->sfcstate->errors = 0;
+
+ cstate->sfcstate->oldowner = CurrentResourceOwner;
+ cstate->sfcstate->oldcontext = cstate->copycontext;
+ }
+
MemoryContextSwitchTo(oldcontext);
return cstate;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..b3a6c9931e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3359,6 +3359,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16756,6 +16760,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17310,6 +17315,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 584d9d5ae6..33d583a94c 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2757,7 +2757,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b77b935005..0bf9641b6e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 8d9cc5accd..7c65157866 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -52,6 +53,25 @@ typedef enum CopyInsertMethod
* ExecForeignBatchInsert only if valid */
} CopyInsertMethod;
+/*
+ * Struct that holding fields for safe copying option enabled by IGNORE_ERRORS.
+ */
+typedef struct SafeCopyFromState
+{
+#define SAFE_BUFFER_SIZE 1000
+#define MAX_SAFE_BUFFER_BYTES 65535
+
+ HeapTuple safe_buffer[SAFE_BUFFER_SIZE]; /* accumulates valid tuples */
+ int saved_tuples; /* # of tuples in safe_buffer */
+ int replayed_tuples; /* # of tuples were replayed from buffer */
+ int safeBufferBytes; /* # of bytes from all buffered tuples */
+ int errors; /* total # of errors */
+
+ MemoryContext safe_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -74,6 +94,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 957ee18d84..ed25a8c86c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5f3685e9ef..cc6d572cf1 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -649,6 +649,135 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err_view, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err_view, line 3: "3 {3}"
+WARNING: COPY check_ign_err_view, line 4, column n: "a"
+WARNING: COPY check_ign_err_view, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err_view, line 6, column n: ""
+WARNING: COPY check_ign_err_view, line 7, column m: "{a, 7}"
+WARNING: Errors: 6
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: COPY check_ign_err, line 2: "2 {2} 2 2"
+WARNING: COPY check_ign_err, line 3: "3 {3}"
+WARNING: COPY check_ign_err, line 4, column n: "a"
+WARNING: COPY check_ign_err, line 5, column k: "5555555555"
+WARNING: COPY check_ign_err, line 6, column n: ""
+WARNING: COPY check_ign_err, line 7, column m: "{a, 7}"
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af48e..b25b20182e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -454,6 +454,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
Hi Damir!
Your work looks like a very promising feature for production systems,
where data often needs to be loaded from external sources.
I've looked over the discussion and want to make a proposal -
when we load a bunch of records in database it does not make sense
to output errors to command output, and does not make sense to limit
error output to any number at all, because if we decided to load data
anyway - we would want to have a list (a file) with all records that were
discarded because of errors, with related error information, to, say,
deal with errors and process these records later. It looks like a reasonable
addition to your patch.
As a command output some limited number of error messages has much
less meaning than overall stats - records processed, records loaded,
records discarded, total number of errors.
For example you can look the Oracle SQL Loader feature, I hope this could
give some ideas for further improvements.
On Wed, Nov 2, 2022 at 11:46 AM Damir Belyalov <dam.bel07@gmail.com> wrote:
Updated the patch:
- Optimized and simplified logic of IGNORE_ERRORS
- Changed variable names to more understandable ones
- Added an analogue of MAX_BUFFERED_BYTES for safe bufferRegards,
Damir Belyalov
Postgres Professional
--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/
Hi!
I have looked at your patch and have a few questions.
110: static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
111: TupleTableSlot *myslot);
---
636: bool
637: SafeCopying(CopyFromState cstate, ExprContext *econtext,
TupleTableSlot *myslot)
Why is there no static keyword in the definition of the SafeCopying()
function, but it is in the function declaration.
675: MemoryContext cxt =
MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
676:
677: valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values,
myslot->tts_isnull);
678: tuple_is_valid = valid_row;
679:
680: if (valid_row)
681: sfcstate->safeBufferBytes += cstate->line_buf.len;
682:
683: CurrentMemoryContext = cxt;
Why are you using a direct assignment to CurrentMemoryContext instead of
using the MemoryContextSwitchTo function in the SafeCopying() routine?
1160: /* Standard copying with option "safe copying" enabled by
IGNORE_ERRORS. */
1161: if (!SafeCopying(cstate, econtext, myslot))
1162: break;
I checked with GDB that the CurrentMemoryContext changes when SafeCopying
returns. And the target context may be different each time you do a COPY in
psql.
1879: cstate->sfcstate->safe_cxt = AllocSetContextCreate(oldcontext,
1880: "Safe_context",
1881: ALLOCSET_DEFAULT_SIZES);
When you initialize the cstate->sfcstate structure, you create a
cstate->sfcstate->safe_cxt memory context that inherits from oldcontext.
Was it intended to use cstate->copycontext as the parent context here?
On Wed, Nov 2, 2022 at 11:46 AM Damir Belyalov <dam.bel07@gmail.com> wrote:
Updated the patch:
- Optimized and simplified logic of IGNORE_ERRORS
- Changed variable names to more understandable ones
- Added an analogue of MAX_BUFFERED_BYTES for safe bufferRegards,
Damir Belyalov
Postgres Professional
Regards,
Daniil Anisimov
Postgres Professional
Hi, Danil and Nikita!
Thank you for reviewing.
Why is there no static keyword in the definition of the SafeCopying()
function, but it is in the function declaration.
Correct this.
675: MemoryContext cxt =
MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
676:
677: valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values,
myslot->tts_isnull);
678: tuple_is_valid = valid_row;
679:
680: if (valid_row)
681: sfcstate->safeBufferBytes += cstate->line_buf.len;
682:
683: CurrentMemoryContext = cxt;Why are you using a direct assignment to CurrentMemoryContext instead of
using the MemoryContextSwitchTo function in the SafeCopying() routine?
"CurrentMemoryContext = cxt" is the same as "MemoryContextSwitchTo(cxt)",
you can see it in the implementation of MemoryContextSwitchTo(). Also
correct this.
When you initialize the cstate->sfcstate structure, you create a
cstate->sfcstate->safe_cxt memory context that inherits from oldcontext.
Was it intended to use cstate->copycontext as the parent context here?
Good remark, correct this.
Thanks Nikita Malakhov for advice to create file with errors. But I decided
to to log errors in the system logfile and don't print them to the
terminal. The error's output in logfile is rather simple - only error
context logs (maybe it's better to log all error information?).
There are 2 points why logging errors in logfile is better than logging
errors in another file (e.g. PGDATA/copy_ignore_errors.txt). The user is
used to looking for errors in logfile. Creating another file entails
problems like: 'what file name to create?', 'do we need to make file
rotation?', 'where does this file create?' (we can't create it in PGDATA
cause of memory constraints)
Regards,
Damir Belyalov
Postgres Professional
Attachments:
0011-COPY_IGNORE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=0011-COPY_IGNORE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..50151aec54 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ containing syntax errors in data, rows with too many or too few columns,
+ rows containing columns where the data type's input function raises an error.
+ Logs errors to system logfile and outputs the total number of errors.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..e741ce3e5a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -407,6 +407,7 @@ ProcessCopyOptions(ParseState *pstate,
bool is_from,
List *options)
{
+ bool ignore_errors_specified = false;
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_errors") == 0)
+ {
+ if (ignore_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_errors_specified = true;
+ opts_out->ignore_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..657fa44e0b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -107,6 +107,9 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+static bool SafeCopying(CopyFromState cstate, ExprContext *econtext,
+ TupleTableSlot *myslot);
+
/*
* error context callback for COPY FROM
*
@@ -625,6 +628,173 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}
+/*
+ * Safely reads source data, converts to a tuple and fills tuple buffer.
+ * Skips some data in the case of failed conversion if data source for
+ * a next tuple can be surely read without a danger.
+ */
+static bool
+SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+{
+ SafeCopyFromState *sfcstate = cstate->sfcstate;
+ bool valid_row = true;
+
+ /* Standard COPY if IGNORE_ERRORS is disabled */
+ if (!cstate->sfcstate)
+ /* Directly stores the values/nulls array in the slot */
+ return NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+
+ if (sfcstate->replayed_tuples < sfcstate->saved_tuples)
+ {
+ Assert(sfcstate->saved_tuples > 0);
+
+ /* Prepare to replay the tuple */
+ heap_deform_tuple(sfcstate->safe_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ return true;
+ }
+ else
+ {
+ /* All tuples from buffer were replayed, clean it up */
+ MemoryContextReset(sfcstate->safe_cxt);
+
+ sfcstate->saved_tuples = sfcstate->replayed_tuples = 0;
+ sfcstate->safeBufferBytes = 0;
+ }
+
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ while (sfcstate->saved_tuples < SAFE_BUFFER_SIZE &&
+ sfcstate->safeBufferBytes < MAX_SAFE_BUFFER_BYTES)
+ {
+ bool tuple_is_valid = true;
+
+ PG_TRY();
+ {
+ MemoryContext cxt = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+ valid_row = NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull);
+ tuple_is_valid = valid_row;
+
+ if (valid_row)
+ sfcstate->safeBufferBytes += cstate->line_buf.len;
+
+ MemoryContextSwitchTo(cxt);
+ }
+ PG_CATCH();
+ {
+ MemoryContext ecxt = MemoryContextSwitchTo(sfcstate->oldcontext);
+ ErrorData *errdata = CopyErrorData();
+
+ tuple_is_valid = false;
+
+ Assert(IsSubTransaction());
+
+ RollbackAndReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ switch (errdata->sqlerrcode)
+ {
+ /* Ignore data exceptions */
+ case ERRCODE_CHARACTER_NOT_IN_REPERTOIRE:
+ case ERRCODE_DATA_EXCEPTION:
+ case ERRCODE_ARRAY_ELEMENT_ERROR:
+ case ERRCODE_DATETIME_VALUE_OUT_OF_RANGE:
+ case ERRCODE_INTERVAL_FIELD_OVERFLOW:
+ case ERRCODE_INVALID_CHARACTER_VALUE_FOR_CAST:
+ case ERRCODE_INVALID_DATETIME_FORMAT:
+ case ERRCODE_INVALID_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_ESCAPE_SEQUENCE:
+ case ERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER:
+ case ERRCODE_INVALID_PARAMETER_VALUE:
+ case ERRCODE_INVALID_TABLESAMPLE_ARGUMENT:
+ case ERRCODE_INVALID_TIME_ZONE_DISPLACEMENT_VALUE:
+ case ERRCODE_NULL_VALUE_NOT_ALLOWED:
+ case ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE:
+ case ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED:
+ case ERRCODE_STRING_DATA_LENGTH_MISMATCH:
+ case ERRCODE_STRING_DATA_RIGHT_TRUNCATION:
+ case ERRCODE_INVALID_TEXT_REPRESENTATION:
+ case ERRCODE_INVALID_BINARY_REPRESENTATION:
+ case ERRCODE_BAD_COPY_FILE_FORMAT:
+ case ERRCODE_UNTRANSLATABLE_CHARACTER:
+ case ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE:
+ case ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION:
+ case ERRCODE_INVALID_JSON_TEXT:
+ case ERRCODE_INVALID_SQL_JSON_SUBSCRIPT:
+ case ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM:
+ case ERRCODE_NO_SQL_JSON_ITEM:
+ case ERRCODE_NON_NUMERIC_SQL_JSON_ITEM:
+ case ERRCODE_NON_UNIQUE_KEYS_IN_A_JSON_OBJECT:
+ case ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED:
+ case ERRCODE_SQL_JSON_ARRAY_NOT_FOUND:
+ case ERRCODE_SQL_JSON_MEMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_NUMBER_NOT_FOUND:
+ case ERRCODE_SQL_JSON_OBJECT_NOT_FOUND:
+ case ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS:
+ case ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS:
+ case ERRCODE_SQL_JSON_SCALAR_REQUIRED:
+ case ERRCODE_SQL_JSON_ITEM_CANNOT_BE_CAST_TO_TARGET_TYPE:
+ /* If the error can be processed, begin a new subtransaction */
+ BeginInternalSubTransaction(NULL);
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ sfcstate->errors++;
+
+ ereport(LOG, (errmsg("%s", errdata->context),
+ errhidecontext(true), errhidestmt(true)));
+
+ break;
+ default:
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ break;
+ }
+
+ FlushErrorState();
+ FreeErrorData(errdata);
+ errdata = NULL;
+
+ MemoryContextSwitchTo(ecxt);
+ }
+ PG_END_TRY();
+
+ if (tuple_is_valid)
+ {
+ /* Add tuple to safe_buffer in Safe_context */
+ HeapTuple saved_tuple;
+
+ MemoryContextSwitchTo(sfcstate->safe_cxt);
+
+ saved_tuple = heap_form_tuple(RelationGetDescr(cstate->rel), myslot->tts_values, myslot->tts_isnull);
+ sfcstate->safe_buffer[sfcstate->saved_tuples++] = saved_tuple;
+ }
+
+ ExecClearTuple(myslot);
+
+ if (!valid_row)
+ break;
+ }
+
+ ReleaseCurrentSubTransaction();
+ CurrentResourceOwner = sfcstate->oldowner;
+
+ /* Prepare to replay the first tuple from safe_buffer */
+ if (sfcstate->saved_tuples != 0)
+ {
+ heap_deform_tuple(sfcstate->safe_buffer[sfcstate->replayed_tuples++], RelationGetDescr(cstate->rel),
+ myslot->tts_values, myslot->tts_isnull);
+ return true;
+ }
+
+ /* End of file and nothing to replay? */
+ if (!valid_row && sfcstate->replayed_tuples == sfcstate->saved_tuples)
+ return false;
+
+ return true;
+}
+
/*
* Copy FROM file to relation.
*/
@@ -991,8 +1161,8 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
- /* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ /* Standard copying with option "safe copying" enabled by IGNORE_ERRORS. */
+ if (!SafeCopying(cstate, econtext, myslot))
break;
ExecStoreVirtualTuple(myslot);
@@ -1276,6 +1446,11 @@ CopyFrom(CopyFromState cstate)
}
}
+ if (cstate->sfcstate && cstate->sfcstate->errors > 0)
+ ereport(WARNING,
+ errmsg("Errors: %d", cstate->sfcstate->errors),
+ errhidecontext(true), errhidestmt(true));
+
/* Flush any remaining buffered tuples */
if (insertMethod != CIM_SINGLE)
{
@@ -1704,6 +1879,25 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
}
+ /* Initialize safeCopyFromState for IGNORE_ERRORS option */
+ if (cstate->opts.ignore_errors)
+ {
+ MemoryContextSwitchTo(cstate->copycontext);
+
+ cstate->sfcstate = palloc(sizeof(SafeCopyFromState));
+
+ cstate->sfcstate->safe_cxt = AllocSetContextCreate(cstate->copycontext,
+ "COPY_safe_context",
+ ALLOCSET_DEFAULT_SIZES);
+ cstate->sfcstate->saved_tuples = 0;
+ cstate->sfcstate->replayed_tuples = 0;
+ cstate->sfcstate->safeBufferBytes = 0;
+ cstate->sfcstate->errors = 0;
+
+ cstate->sfcstate->oldowner = CurrentResourceOwner;
+ cstate->sfcstate->oldcontext = cstate->copycontext;
+ }
+
MemoryContextSwitchTo(oldcontext);
return cstate;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..7d3e0c6ba8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3378,6 +3378,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_ERRORS
+ {
+ $$ = makeDefElem("ignore_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16821,6 +16825,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17375,6 +17380,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5e1882eaea..63d56018dd 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2857,7 +2857,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8e5f6ff148..c3796c9d37 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_errors; /* ignore rows with errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 7b1c4327bd..e90aa47076 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "utils/resowner.h"
/*
* Represents the different source cases we need to worry about at
@@ -52,6 +53,25 @@ typedef enum CopyInsertMethod
* ExecForeignBatchInsert only if valid */
} CopyInsertMethod;
+/*
+ * Struct that holding fields for safe copying option enabled by IGNORE_ERRORS.
+ */
+typedef struct SafeCopyFromState
+{
+#define SAFE_BUFFER_SIZE 1000
+#define MAX_SAFE_BUFFER_BYTES 65535
+
+ HeapTuple safe_buffer[SAFE_BUFFER_SIZE]; /* accumulates valid tuples */
+ int saved_tuples; /* # of tuples in safe_buffer */
+ int replayed_tuples; /* # of tuples were replayed from buffer */
+ int safeBufferBytes; /* # of bytes from all buffered tuples */
+ int errors; /* total # of errors */
+
+ MemoryContext safe_cxt;
+ MemoryContext oldcontext;
+ ResourceOwner oldowner;
+} SafeCopyFromState;
+
/*
* This struct contains all the state variables used throughout a COPY FROM
* operation.
@@ -74,6 +94,7 @@ typedef struct CopyFromStateData
char *filename; /* filename, or NULL for STDIN */
bool is_program; /* is 'filename' a program to popen? */
copy_data_source_cb data_source_cb; /* function for reading data */
+ SafeCopyFromState *sfcstate; /* struct for ignore_errors option */
CopyFormatOptions opts;
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..4f27d49567 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_errors", IGNORE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 090ef6c7a8..27512d8e41 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -666,6 +666,105 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before on check_ign_err;
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: Errors: 6
+SELECT * FROM trig_test;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+-- foreign table case is in postgres_fdw extension
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+(1 row)
+
+DROP TABLE check_ign_err;
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+WARNING: Errors: 6
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 8 | {8} | 8
+(2 rows)
+
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b0de82c3aa..c77cfaaf4d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -464,6 +464,122 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_ERRORS option
+-- CIM_MULTI case
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+
+-- CIM_SINGLE cases
+-- BEFORE row trigger
+TRUNCATE check_ign_err;
+CREATE TABLE trig_test(n int, m int[], k int);
+CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before on check_ign_err;
+
+-- INSTEAD OF row trigger
+TRUNCATE check_ign_err;
+TRUNCATE trig_test;
+CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
+CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m, NEW.k);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
+COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM trig_test;
+DROP TRIGGER trig_instead_of ON check_ign_err_view;
+DROP VIEW check_ign_err_view;
+
+-- foreign table case is in postgres_fdw extension
+
+-- volatile function in WHERE clause
+TRUNCATE check_ign_err;
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
+ WHERE n = floor(random()*(1-1+1))+1; /* finds values equal 1 */
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TABLE check_ign_err;
+
+-- CIM_MULTI_CONDITIONAL case
+-- INSERT triggers for partition tables
+TRUNCATE trig_test;
+CREATE TABLE check_ign_err (n int, m int[], k int)
+ PARTITION BY RANGE (k);
+CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
+ FOR VALUES FROM (1) TO (4);
+CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
+ FOR VALUES FROM (4) TO (9);
+CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
+ BEGIN
+ INSERT INTO trig_test VALUES(NEW.n, NEW.m);
+ RETURN NEW;
+ END;
+' LANGUAGE plpgsql;
+CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
+FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
+COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS WHERE n < 9;
+1 {1} 1
+2 {2} 2 2
+3 {3}
+a {4} 4
+5 {5} 5555555555
+
+7 {a, 7} 7
+8 {8} 8
+\.
+SELECT * FROM check_ign_err;
+DROP TRIGGER trig_before_part on check_ign_err;
+DROP TABLE trig_test;
+DROP TABLE check_ign_err CASCADE;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
H,
On 2023-02-03 13:27:24 +0300, Damir Belyalov wrote:
@@ -625,6 +628,173 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
miinfo->bufferedBytes += tuplen;
}+/* + * Safely reads source data, converts to a tuple and fills tuple buffer. + * Skips some data in the case of failed conversion if data source for + * a next tuple can be surely read without a danger. + */ +static bool +SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot *myslot)
+ BeginInternalSubTransaction(NULL); + CurrentResourceOwner = sfcstate->oldowner;
I don't think this is the right approach. Creating a subtransaction for
each row will cause substantial performance issues.
We now can call data type input functions without throwing errors, see
InputFunctionCallSafe(). Use that to avoid throwing an error instead of
catching it.
Greetings,
Andres Freund
Hi, Andres!
Thank you for reviewing.
I don't think this is the right approach. Creating a subtransaction for
each row will cause substantial performance issues.
Subtransactions aren't created for each row. The block of rows in one
subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed. There is also
a constraint for the number of bytes MAX_SAFE_BUFFER_BYTES in safe_buffer:
while (sfcstate->saved_tuples < SAFE_BUFFER_SIZE &&
sfcstate->safeBufferBytes < MAX_SAFE_BUFFER_BYTES)
We now can call data type input functions without throwing errors, see
InputFunctionCallSafe(). Use that to avoid throwing an error instead of
catching it.
InputFunctionCallSafe() is good for detecting errors from input-functions
but there are such errors from NextCopyFrom () that can not be detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. Do
you offer to process input-function errors separately from other errors?
Now all errors are processed in one "switch" loop in PG_CATCH, so this
change can complicate code.
Regards,
Damir Belyalov
Postgres Professional
Damir Belyalov <dam.bel07@gmail.com> writes:
I don't think this is the right approach. Creating a subtransaction for
each row will cause substantial performance issues.
Subtransactions aren't created for each row. The block of rows in one
subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed.
I think that at this point, any patch that involves adding subtransactions
to COPY is dead on arrival; whether it's batched or not is irrelevant.
(It's not like batching has no downsides.)
InputFunctionCallSafe() is good for detecting errors from input-functions
but there are such errors from NextCopyFrom () that can not be detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.
If you want to deal with those, then there's more work to be done to make
those bits non-error-throwing. But there's a very finite amount of code
involved and no obvious reason why it couldn't be done. The major problem
here has always been the indefinite amount of code implicated by calling
datatype input functions, and we have now created a plausible answer to
that problem.
regards, tom lane
Hi,
On February 5, 2023 9:12:17 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Damir Belyalov <dam.bel07@gmail.com> writes:
I don't think this is the right approach. Creating a subtransaction for
each row will cause substantial performance issues.Subtransactions aren't created for each row. The block of rows in one
subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed.I think that at this point, any patch that involves adding subtransactions
to COPY is dead on arrival; whether it's batched or not is irrelevant.
(It's not like batching has no downsides.)
Indeed.
InputFunctionCallSafe() is good for detecting errors from input-functions
but there are such errors from NextCopyFrom () that can not be detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.If you want to deal with those, then there's more work to be done to make
those bits non-error-throwing. But there's a very finite amount of code
involved and no obvious reason why it couldn't be done. The major problem
here has always been the indefinite amount of code implicated by calling
datatype input functions, and we have now created a plausible answer to
that problem.
I'm not even sure it makes sense to avoid that kind of error. And invalid column count or such is something quite different than failing some data type input routine, or falling a constraint.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes:
On February 5, 2023 9:12:17 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Damir Belyalov <dam.bel07@gmail.com> writes:
InputFunctionCallSafe() is good for detecting errors from input-functions
but there are such errors from NextCopyFrom () that can not be detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in row''.
If you want to deal with those, then there's more work to be done to make
those bits non-error-throwing. But there's a very finite amount of code
involved and no obvious reason why it couldn't be done.
I'm not even sure it makes sense to avoid that kind of error. And
invalid column count or such is something quite different than failing
some data type input routine, or falling a constraint.
I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.
regards, tom lane
On 2023-02-06 15:00, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On February 5, 2023 9:12:17 PM PST, Tom Lane <tgl@sss.pgh.pa.us>
wrote:Damir Belyalov <dam.bel07@gmail.com> writes:
InputFunctionCallSafe() is good for detecting errors from
input-functions
but there are such errors from NextCopyFrom () that can not be
detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in
row''.If you want to deal with those, then there's more work to be done to
make
those bits non-error-throwing. But there's a very finite amount of
code
involved and no obvious reason why it couldn't be done.I'm not even sure it makes sense to avoid that kind of error. And
invalid column count or such is something quite different than failing
some data type input routine, or falling a constraint.I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.
Is it a bad idea to limit the scope of allowing errors to 'soft' errors
in InputFunctionCallSafe()?
I think it could be still useful for some usecases.
diff --git a/src/test/regress/sql/copy2.sql
b/src/test/regress/sql/copy2.sql
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
diff --git a/src/test/regress/expected/copy2.out
b/src/test/regress/expected/copy2.out
index 090ef6c7a8..08e8056fc1 100644
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v1-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-diff; name=v1-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From 16877d4cdd64db5f85bed9cd559e618d8211e598 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 27 Feb 2023 12:02:16 +0900
Subject: [PATCH v1] Add COPY option IGNORE_DATATYPE_ERRORS
---
src/backend/commands/copy.c | 8 ++++++++
src/backend/commands/copyfrom.c | 11 +++++++++++
src/backend/commands/copyfromparse.c | 12 ++++++++++--
src/backend/parser/gram.y | 8 +++++++-
src/bin/psql/tab-complete.c | 3 ++-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 2 ++
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 14 ++++++++++++++
src/test/regress/sql/copy2.sql | 12 ++++++++++++
10 files changed, 68 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..2f1cfb3f4d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified= false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified= true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..24eec6a27d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -959,6 +959,7 @@ CopyFrom(CopyFromState cstate)
{
TupleTableSlot *myslot;
bool skip_tuple;
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
CHECK_FOR_INTERRUPTS();
@@ -991,10 +992,20 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ /* Soft error occured, skip this tuple */
+ if(cstate->escontext.error_occurred)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 91b564c2bc..12b1780fd6 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -938,10 +939,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;
- values[m] = InputFunctionCall(&in_functions[m],
+ if (!InputFunctionCallSafe(&in_functions[m],
string,
typioparams[m],
- att->atttypmod);
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ ereport(WARNING,
+ errmsg("%s", cstate->escontext.error_data->message));
+ return true;
+ }
if (string != NULL)
nulls[m] = false;
cstate->cur_attname = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..d79d293c0d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_DATATYPE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3378,6 +3378,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_DATATYPE_ERRORS
+ {
+ $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16821,6 +16825,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17375,6 +17380,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5e1882eaea..a363351d3b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2857,7 +2857,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8e5f6ff148..a7eb0f8883 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 7b1c4327bd..d74c633481 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,7 @@ typedef struct CopyFromStateData
AttrNumber num_defaults;
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
int *defmap; /* array of default att numbers */
ExprState **defexprs; /* array of default att expressions */
bool volatile_defexprs; /* is any of defexprs volatile? */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..1d7f9efbc0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_datatype_errors", IGNORE_DATATYPE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 090ef6c7a8..08e8056fc1 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -666,6 +666,20 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b0de82c3aa..380adfce96 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -464,6 +464,18 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.25.1
Hello
Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. As
expected it works.
Also added a description to copy.sgml and made a review on patch.
I added 'ignored_errors' integer parameter that should be output after the
option is finished.
All errors were added to the system logfile with full detailed context.
Maybe it's better to log only error message.
file:///home/abc13/Documents/todo_copy/postgres/v2-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patch
Regards, Damir Belyalov
Postgres Professional
Attachments:
v2-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchapplication/x-patch; name=v2-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..706b929947 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,17 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ Outputs warnings about rows with incorrect data to system logfile.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..0334894014 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified= false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..ecaa750568 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -955,10 +955,14 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ if (cstate->opts.ignore_datatype_errors)
+ cstate->ignored_errors = 0;
+
for (;;)
{
TupleTableSlot *myslot;
bool skip_tuple;
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
CHECK_FOR_INTERRUPTS();
@@ -991,9 +995,26 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
break;
+ }
+
+ /* Soft error occured, skip this tuple */
+ if (cstate->escontext.error_occurred)
+ {
+ ExecClearTuple(myslot);
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 91b564c2bc..9c36b0dc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -938,10 +939,23 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+
+ /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors++;
+
+ ereport(LOG,
+ errmsg("%s", cstate->escontext.error_data->message));
+
+ return true;
+ }
+
if (string != NULL)
nulls[m] = false;
cstate->cur_attname = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..d79d293c0d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_DATATYPE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3378,6 +3378,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_DATATYPE_ERRORS
+ {
+ $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16821,6 +16825,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17375,6 +17380,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5e1882eaea..a363351d3b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2857,7 +2857,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8e5f6ff148..a7eb0f8883 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 7b1c4327bd..4724fca195 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
AttrNumber num_defaults;
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ uint64 ignored_errors; /* total number of ignored errors */
int *defmap; /* array of default att numbers */
ExprState **defexprs; /* array of default att expressions */
bool volatile_defexprs; /* is any of defexprs volatile? */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..1d7f9efbc0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_datatype_errors", IGNORE_DATATYPE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 090ef6c7a8..b4dadbf7a9 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -666,6 +666,17 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+WARNING: Errors: 4
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b0de82c3aa..380adfce96 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -464,6 +464,18 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
On 28 Feb 2023, at 15:28, Damir Belyalov <dam.bel07@gmail.com> wrote:
Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. As expected it works.
Also added a description to copy.sgml and made a review on patch.I added 'ignored_errors' integer parameter that should be output after the option is finished.
All errors were added to the system logfile with full detailed context. Maybe it's better to log only error message.
FWIW, Greenplum has a similar construct (but which also logs the errors in the
db) where data type errors are skipped as long as the number of errors don't
exceed a reject limit. If the reject limit is reached then the COPY fails:
LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
IIRC the gist of this was to catch then the user copies the wrong input data or
plain has a broken file. Rather than finding out after copying n rows which
are likely to be garbage the process can be restarted.
This version of the patch has a compiler error in the error message:
copyfrom.c: In function ‘CopyFrom’:
copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=]
1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
| ^~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
| |
| uint64 {aka long long unsigned int}
On that note though, it seems to me that this error message leaves a bit to be
desired with regards to the level of detail.
--
Daniel Gustafsson
On 2023-03-06 23:03, Daniel Gustafsson wrote:
On 28 Feb 2023, at 15:28, Damir Belyalov <dam.bel07@gmail.com> wrote:
Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION.
As expected it works.
Also added a description to copy.sgml and made a review on patch.
Thanks for your tests and improvements!
I added 'ignored_errors' integer parameter that should be output after
the option is finished.
All errors were added to the system logfile with full detailed
context. Maybe it's better to log only error message.
Certainly.
FWIW, Greenplum has a similar construct (but which also logs the errors
in the
db) where data type errors are skipped as long as the number of errors
don't
exceed a reject limit. If the reject limit is reached then the COPY
fails:LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
IIRC the gist of this was to catch then the user copies the wrong input
data or
plain has a broken file. Rather than finding out after copying n rows
which
are likely to be garbage the process can be restarted.This version of the patch has a compiler error in the error message:
copyfrom.c: In function ‘CopyFrom’:
copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long
int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}
[-Werror=format=]
1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
| ^~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
| |
| uint64 {aka long
long unsigned int}On that note though, it seems to me that this error message leaves a
bit to be
desired with regards to the level of detail.
+1.
I felt just logging "Error: %ld" would make people wonder the meaning of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
FWIW, Greenplum has a similar construct (but which also logs the errors
in the
db) where data type errors are skipped as long as the number of errors
don't
exceed a reject limit. If the reject limit is reached then the COPY
fails:LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
IIRC the gist of this was to catch then the user copies the wrong input
data or
plain has a broken file. Rather than finding out after copying n rows
which
are likely to be garbage the process can be restarted.
I think this is a matter for discussion. The same question is: "Where to
log errors to separate files or to the system logfile?".
IMO it's better for users to log short-detailed error message to system
logfile and not output errors to the terminal.
This version of the patch has a compiler error in the error message:
Yes, corrected it. Changed "ignored_errors" to int64 because "processed"
(used for counting copy rows) is int64.
I felt just logging "Error: %ld" would make people wonder the meaning of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.
Thanks. For more clearance change the message to: "Errors were found: %".
Regards, Damir Belyalov
Postgres Professional
Attachments:
v3-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..706b929947 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -233,6 +234,17 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ Outputs warnings about rows with incorrect data to system logfile.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..0334894014 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified= false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 29cd1cf4a6..facfc44def 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -949,10 +949,14 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ if (cstate->opts.ignore_datatype_errors)
+ cstate->ignored_errors = 0;
+
for (;;)
{
TupleTableSlot *myslot;
bool skip_tuple;
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
CHECK_FOR_INTERRUPTS();
@@ -985,9 +989,26 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ ereport(WARNING, errmsg("Errors were found: %lld", (long long) cstate->ignored_errors));
break;
+ }
+
+ /* Soft error occured, skip this tuple */
+ if (cstate->escontext.error_occurred)
+ {
+ ExecClearTuple(myslot);
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 91b564c2bc..9c36b0dc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -938,10 +939,23 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+
+ /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors++;
+
+ ereport(LOG,
+ errmsg("%s", cstate->escontext.error_data->message));
+
+ return true;
+ }
+
if (string != NULL)
nulls[m] = false;
cstate->cur_attname = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..d79d293c0d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_DATATYPE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3378,6 +3378,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_DATATYPE_ERRORS
+ {
+ $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16821,6 +16825,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17375,6 +17380,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 8f12af799b..0f290cd6ff 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2857,7 +2857,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8e5f6ff148..a7eb0f8883 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 7b1c4327bd..b9ce636f7b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
AttrNumber num_defaults;
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_errors; /* total number of ignored errors */
int *defmap; /* array of default att numbers */
ExprState **defexprs; /* array of default att expressions */
bool volatile_defexprs; /* is any of defexprs volatile? */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..1d7f9efbc0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_datatype_errors", IGNORE_DATATYPE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 090ef6c7a8..525e3bc454 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -666,6 +666,17 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+WARNING: Errors were found: 4
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b0de82c3aa..380adfce96 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -464,6 +464,18 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
On 7 Mar 2023, at 09:35, Damir Belyalov <dam.bel07@gmail.com> wrote:
I felt just logging "Error: %ld" would make people wonder the meaning of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.Thanks. For more clearance change the message to: "Errors were found: %".
I'm not convinced that this adds enough clarity to assist the user. We also
shouldn't use "error" in a WARNING log since the user has explicitly asked to
skip rows on error, so it's not an error per se. How about something like:
ereport(WARNING,
(errmsg("%ld rows were skipped due to data type incompatibility", cstate->ignored_errors),
errhint("Skipped rows can be inspected in the database log for reprocessing.")));
--
Daniel Gustafsson
On 2023-03-07 18:09, Daniel Gustafsson wrote:
On 7 Mar 2023, at 09:35, Damir Belyalov <dam.bel07@gmail.com> wrote:
I felt just logging "Error: %ld" would make people wonder the meaning
of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.Thanks. For more clearance change the message to: "Errors were found:
%".I'm not convinced that this adds enough clarity to assist the user. We
also
shouldn't use "error" in a WARNING log since the user has explicitly
asked to
skip rows on error, so it's not an error per se.
+1
How about something like:
ereport(WARNING,
(errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
errhint("Skipped rows can be inspected in the database log
for reprocessing.")));
Since skipped rows cannot be inspected in the log when
log_error_verbosity is set to terse,
it might be better without this errhint.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
On 2023-03-17 21:23, torikoshia wrote:
On 2023-03-07 18:09, Daniel Gustafsson wrote:
On 7 Mar 2023, at 09:35, Damir Belyalov <dam.bel07@gmail.com> wrote:
I felt just logging "Error: %ld" would make people wonder the meaning
of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.Thanks. For more clearance change the message to: "Errors were found:
%".I'm not convinced that this adds enough clarity to assist the user.
We also
shouldn't use "error" in a WARNING log since the user has explicitly
asked to
skip rows on error, so it's not an error per se.+1
How about something like:
ereport(WARNING,
(errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
errhint("Skipped rows can be inspected in the database log
for reprocessing.")));Since skipped rows cannot be inspected in the log when
log_error_verbosity is set to terse,
it might be better without this errhint.
Removed errhint.
Modified some codes since v3 couldn't be applied HEAD anymore.
Also modified v3 patch as below:
65 + if (cstate->opts.ignore_datatype_errors)
66 + cstate->ignored_errors = 0;
67 +
It seems not necessary since cstate is initialized by palloc0() in
BeginCopyFrom().
134 + ereport(LOG,
135 + errmsg("%s",
cstate->escontext.error_data->message));
136 +
137 + return true;
Since LOG means 'Reports information of interest to administrators'
according to the manual[1]https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS, datatype error should not be logged as
LOG. I put it back in WARNING.
[1]: https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v4-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-diff; name=v4-0001-Add-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From 6764d7e0f21ca266d7426cb922fd00e5138ec857 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Wed, 22 Mar 2023 22:00:15 +0900
Subject: [PATCH v4] Add new COPY option IGNORE_DATATYPE_ERRORS
Add new COPY option IGNORE_DATATYPE_ERRORS.
Currently entire COPY fails even when there is one unexpected
data regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and
COPY data which don't contain problem.
This patch uses the soft error handling infrastructure, which
is introduced by d9f7f5d32f20.
Author: Damir Belyalov, Atsushi Torikoshi
---
doc/src/sgml/ref/copy.sgml | 12 ++++++++++++
src/backend/commands/copy.c | 8 ++++++++
src/backend/commands/copyfrom.c | 20 ++++++++++++++++++++
src/backend/commands/copyfromparse.c | 19 +++++++++++++++----
src/backend/parser/gram.y | 8 +++++++-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 +++
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 15 +++++++++++++++
src/test/regress/sql/copy2.sql | 12 ++++++++++++
10 files changed, 94 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e591ed2e6..168b1c05d9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -234,6 +235,17 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ Outputs warnings about rows with incorrect data to system logfile.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..02d911abbe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 80bca79cd0..85c47f54b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -953,6 +953,7 @@ CopyFrom(CopyFromState cstate)
{
TupleTableSlot *myslot;
bool skip_tuple;
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
CHECK_FOR_INTERRUPTS();
@@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->ignored_errors));
break;
+ }
+
+ /* Soft error occured, skip this tuple */
+ if (cstate->escontext.error_occurred)
+ {
+ ExecClearTuple(myslot);
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 3853902a16..b06c44e298 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors++;
+
+ ereport(WARNING,
+ errmsg("%s", cstate->escontext.error_data->message));
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index efe88ccf9d..22bf63b42b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -701,7 +701,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+ IDENTITY_P IF_P IGNORE_DATATYPE_ERRORS ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -3378,6 +3378,10 @@ copy_opt_item:
{
$$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1);
}
+ | IGNORE_DATATYPE_ERRORS
+ {
+ $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
+ }
| DELIMITER opt_as Sconst
{
$$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
@@ -16827,6 +16831,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
@@ -17382,6 +17387,7 @@ bare_label_keyword:
| HOLD
| IDENTITY_P
| IF_P
+ | IGNORE_DATATYPE_ERRORS
| ILIKE
| IMMEDIATE
| IMMUTABLE
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 33175868f6..c2e55ac21f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ac2c16f8b8..1164c71631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_errors; /* total number of ignored errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 753e9ee174..5ea159e879 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -196,6 +196,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore_datatype_errors", IGNORE_DATATYPE_ERRORS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 8e33eee719..a6bf3d66a4 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -666,6 +666,21 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index d759635068..c934029314 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -464,6 +464,18 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
base-commit: d69c404c4cc5985d8ae5b5ed38bed3400b317f82
--
2.25.1
Hi,
Tom, see below - I wonder if should provide one more piece of infrastructure
around the saved error stuff...
Have you measured whether this has negative performance effects when *NOT*
using the new option?
As-is this does not work with FORMAT BINARY - and converting the binary input
functions to support soft errors won't happen for 16. So I think you need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.
On 2023-03-22 22:34:20 +0900, torikoshia wrote:
@@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors) + { + escontext.details_wanted = true; + cstate->escontext = escontext; + }
I think it might be worth pulling this out of the loop. That does mean you'd
have to reset escontext.error_occurred after an error, but that doesn't seem
too bad, you need to do other cleanup anyway.
@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); } else - values[m] = InputFunctionCall(&in_functions[m], - string, - typioparams[m], - att->atttypmod); + /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */ + if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) &cstate->escontext, + &values[m])) + { + cstate->ignored_errors++; + + ereport(WARNING, + errmsg("%s", cstate->escontext.error_data->message));
That isn't right - you loose all the details of the message. As is you'd also
leak the error context.
I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);
I wonder if we ought to provide a wrapper for this? It could e.g. know to
mention the original elevel and such?
I don't think NextCopyFrom() is the right place to emit this warning - it
e.g. is also called from file_fdw.c, which might want to do something else
with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.
@@ -3378,6 +3378,10 @@ copy_opt_item: { $$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1); } + | IGNORE_DATATYPE_ERRORS + { + $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1); + } | DELIMITER opt_as Sconst { $$ = makeDefElem("delimiter", (Node *) makeString($3), @1);
I think we shouldn't add a new keyword for this, but only support this via
/* new COPY option syntax */
copy_generic_opt_list:
copy_generic_opt_elem
Further increasing the size of the grammar with random keywords when we have
more generic ways to represent them seems unnecessary.
+-- tests for IGNORE_DATATYPE_ERRORS option +CREATE TABLE check_ign_err (n int, m int[], k int); +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; +1 {1} 1 +a {2} 2 +3 {3} 3333333333 +4 {a, 4} 4 + +5 {5} 5 +\. +SELECT * FROM check_ign_err; +
I suggest adding a few more tests:
- COPY with a datatype error that can't be handled as a soft error
- test documenting that COPY FORMAT BINARY is incompatible with IGNORE_DATATYPE_ERRORS
- a soft error showing the error context - although that will require some
care to avoid the function name + line in the output
Greetings,
Andres Freund
On 2023-03-23 02:50, Andres Freund wrote:
Hi,
Tom, see below - I wonder if should provide one more piece of
infrastructure
around the saved error stuff...Have you measured whether this has negative performance effects when
*NOT*
using the new option?As-is this does not work with FORMAT BINARY - and converting the binary
input
functions to support soft errors won't happen for 16. So I think you
need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.On 2023-03-22 22:34:20 +0900, torikoshia wrote:
@@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors) + { + escontext.details_wanted = true; + cstate->escontext = escontext; + }I think it might be worth pulling this out of the loop. That does mean
you'd
have to reset escontext.error_occurred after an error, but that doesn't
seem
too bad, you need to do other cleanup anyway.@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); } else - values[m] = InputFunctionCall(&in_functions[m], - string, - typioparams[m], - att->atttypmod); + /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */ + if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) &cstate->escontext, + &values[m])) + { + cstate->ignored_errors++; + + ereport(WARNING, + errmsg("%s", cstate->escontext.error_data->message));That isn't right - you loose all the details of the message. As is
you'd also
leak the error context.I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);
Thanks for your reviewing!
I'll try to fix it this way for the time being.
I wonder if we ought to provide a wrapper for this? It could e.g. know
to
mention the original elevel and such?
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
On 2023-03-23 02:50, Andres Freund wrote:
Thanks again for your review.
Attached v5 patch.
Have you measured whether this has negative performance effects when
*NOT*
using the new option?
I loaded 10000000 rows of pgbench_accounts on my laptop and compared the
elapsed time.
GUCs changed from the default are logging_collector = on,
log_error_verbosity = verbose.
Three tests were run under each condition and the middle of them is
listed below:
- patch NOT applied(36f40ce2dc66): 35299ms
- patch applied, without IGNORE_DATATYPE_ERRORS: 34409ms
- patch applied, with IGNORE_DATATYPE_ERRORS: 35510ms
It seems there are no significant degradation.
Also tested the elapsed time when loading data which has some datatype
error with IGNORE_DATATYPE_ERRORS:
- data has 100 rows of error: 35269ms
- data has 1000 rows of error: 34577ms
- data has 5000000 rows of error: 48925ms
5000000 rows of error consumes much time, but it seems to be influenced
by logging time.
Here are test results under log_min_messages and client_min_messages are
'error':
- data has 5000000 data type error: 23972ms
- data has 0 data type error: 34320ms
Now conversely, when there were many data type errors, it consumes less
time.
This seems like a reasonable result since the amount of skipped data is
increasing.
As-is this does not work with FORMAT BINARY - and converting the binary
input
functions to support soft errors won't happen for 16. So I think you
need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.
Added the option check.
On 2023-03-22 22:34:20 +0900, torikoshia wrote:
@@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
+ if (cstate->opts.ignore_datatype_errors) + { + escontext.details_wanted = true; + cstate->escontext = escontext; + }I think it might be worth pulling this out of the loop. That does mean
you'd
have to reset escontext.error_occurred after an error, but that doesn't
seem
too bad, you need to do other cleanup anyway.
Pull this out of the loop and added process for resetting escontext.
@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); } else - values[m] = InputFunctionCall(&in_functions[m], - string, - typioparams[m], - att->atttypmod); + /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */ + if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) &cstate->escontext, + &values[m])) + { + cstate->ignored_errors++; + + ereport(WARNING, + errmsg("%s", cstate->escontext.error_data->message));That isn't right - you loose all the details of the message. As is
you'd also
leak the error context.I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);
As I mentioned in one previous email, added above codes for now.
I wonder if we ought to provide a wrapper for this? It could e.g. know
to
mention the original elevel and such?I don't think NextCopyFrom() is the right place to emit this warning -
it
e.g. is also called from file_fdw.c, which might want to do something
else
with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.
Agreed.
@@ -3378,6 +3378,10 @@ copy_opt_item: { $$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1); } + | IGNORE_DATATYPE_ERRORS + { + $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1); + } | DELIMITER opt_as Sconst { $$ = makeDefElem("delimiter", (Node *) makeString($3), @1);I think we shouldn't add a new keyword for this, but only support this
via
/* new COPY option syntax */
copy_generic_opt_list:
copy_generic_opt_elemFurther increasing the size of the grammar with random keywords when we
have
more generic ways to represent them seems unnecessary.
Agreed.
+-- tests for IGNORE_DATATYPE_ERRORS option +CREATE TABLE check_ign_err (n int, m int[], k int); +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; +1 {1} 1 +a {2} 2 +3 {3} 3333333333 +4 {a, 4} 4 + +5 {5} 5 +\. +SELECT * FROM check_ign_err; +I suggest adding a few more tests:
- COPY with a datatype error that can't be handled as a soft error
Added a test for cases with missing columns.
However it's not datatype error and not what you expected, is it?
- test documenting that COPY FORMAT BINARY is incompatible with
IGNORE_DATATYPE_ERRORS
Added it.
- a soft error showing the error context - although that will require
some
care to avoid the function name + line in the output
I assume you mean a test to check the server log, but I haven't come up
with a way to do it.
Adding a TAP test might do it, but I think it would be overkill to add
one just for this.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-diff; name=v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From 6b646d96a9c2a310836693452deb2128636d1beb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 27 Mar 2023 22:15:49 +0900
Subject: [PATCH v5] Add new COPY option IGNORE_DATATYPE_ERRORS
Currently entire COPY fails when there exists unexpected
data regarding data type or range.
In some cases, it would be useful to skip copying such data
and continue copying and IGNORE_DATATYPE_ERRORS does this.
This patch uses the soft error handling infrastructure, which
is introduced by d9f7f5d32f20.
Author: Damir Belyalov, Atsushi Torikoshi
---
doc/src/sgml/ref/copy.sgml | 13 ++++++++++
src/backend/commands/copy.c | 15 ++++++++++-
src/backend/commands/copyfrom.c | 32 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 ++++++++++---
src/bin/psql/tab-complete.c | 3 ++-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 +++
src/test/regress/expected/copy2.out | 22 ++++++++++++++++
src/test/regress/sql/copy2.sql | 19 ++++++++++++++
9 files changed, 119 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e591ed2e6..cea56d65eb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORMAT <replaceable class="parameter">format_name</replaceable>
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> | MATCH ]
@@ -234,6 +235,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ This option is not allowed when using binary format. Note that this
+ is only supported in current <command>COPY</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>DELIMITER</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..aa50cc1ee1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -576,7 +584,7 @@ ProcessCopyOptions(ParseState *pstate,
}
/*
- * Check for incompatible options (must do these two before inserting
+ * Check for incompatible options (must do these before inserting
* defaults)
*/
if (opts_out->binary && opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->ignore_datatype_errors)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 80bca79cd0..9e23cd45d5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -652,6 +652,7 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
@@ -752,6 +753,13 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
+ /* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -987,7 +995,31 @@ CopyFrom(CopyFromState cstate)
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors && cstate->ignored_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->ignored_errors));
break;
+ }
+
+ /* Soft error occured, skip this tuple and cleanup the escontext */
+ if (cstate->escontext.error_occurred)
+ {
+ ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+ /* adjust elevel so we don't jump out */
+ cstate->escontext.error_data->elevel = WARNING;
+ /* despite the name, this won't raise an error since elevel is WARNING now */
+ ThrowErrorData(cstate->escontext.error_data);
+
+ ExecClearTuple(myslot);
+
+ new_escontext.details_wanted = true;
+ cstate->escontext = new_escontext;
+
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 3853902a16..6b0447782d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,18 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors++;
+
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 42e87b9e49..4b443d4ea7 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2857,7 +2857,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 33175868f6..c2e55ac21f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ac2c16f8b8..1164c71631 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_errors; /* total number of ignored errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 8e33eee719..99f70f44ed 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, ignore_datatype_errors);
+ERROR: cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY force quote available only in CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -666,6 +668,25 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -680,6 +701,7 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index d759635068..e88f27bbfd 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,6 +70,7 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, ignore_datatype_errors);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
@@ -464,6 +465,23 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -478,6 +496,7 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
--
-- COPY FROM ... DEFAULT
base-commit: 36f40ce2dc66f1a36d6a12f7a0352e1c5bf1063e
--
2.25.1
Hi!
I made the specified changes and my patch turned out the same as yours. The
performance measurements were the same too.
The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as a
keyword. See how this is done for parameters such as FORCE_NOT_NULL,
FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as keywords
in gram.y.
Regards,
Damir Belyalov
Postgres Professional
On 2023-03-27 23:28, Damir Belyalov wrote:
Hi!
I made the specified changes and my patch turned out the same as
yours. The performance measurements were the same too.
Thanks for your review and measurements.
The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as
a keyword. See how this is done for parameters such as FORCE_NOT_NULL,
FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as
keywords in gram.y.
I might misunderstand something, but I believe the v5 patch uses
copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a
keyword.
It modifies neither kwlist.h nor gram.y.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
I might misunderstand something, but I believe the v5 patch uses
copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a
keyword.
It modifies neither kwlist.h nor gram.y.
Sorry, didn't notice that. I think everything is alright now.
Regards,
Damir Belyalov
Postgres Professional
Hi!
Thank you, Damir, for your patch. It is very interesting to review it!
It seemed to me that the names of variables are not the same everywhere.
I noticed that you used /ignore_datatype_errors_specified/ variable in
/copy.c/ , but guc has a short name /ignore_datatype_errors/. Also you
used the short variable name in /CopyFormatOptions/ structure.
Name used /ignore_datatype_errors_specified /is seemed very long to me,
may be use a short version of it (/ignore_datatype_errors/) in /copy.c/ too?
Besides, I noticed that you used /ignored_errors/ variable in
/CopyFromStateData/ structure and it's name is strikingly similar to
name (/ignore_datatype_error//s/), but they have different meanings.
Maybe it will be better to rename it as /ignored_errors_counter/?
I tested last version
/v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch/ with /bytea/
data type and transaction cases. Eventually, I didn't find any problem
there.
I described my steps in more detail, if I missed something.
*First of all, I ran copy function with IGNORE_DATATYPE_ERRORS parameter
being in transaction block.*
/
//File t2.csv exists:/
|id,b
769,\
1,\6e
2,\x5
5,\x|
/Test:/
CREATE TABLE t (id INT , b BYTEA) ;
postgres=# BEGIN;
copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv',
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);
SAVEPOINT my_savepoint;
BEGIN
WARNING: invalid input syntax for type bytea
WARNING: invalid input syntax for type bytea
WARNING: invalid hexadecimal data: odd number of digits
WARNING: 3 rows were skipped due to data type incompatibility
COPY 1
SAVEPOINT
postgres=*# copy t FROM '/home/alena/postgres/t2.csv' WITH (format
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);
WARNING: invalid input syntax for type bytea
WARNING: invalid input syntax for type bytea
WARNING: invalid hexadecimal data: odd number of digits
WARNING: 3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# ROLLBACK TO my_savepoint;
ROLLBACK
postgres=*# select * from t;
id | b
----+----
5 | \x
(1 row)
postgres=*# copy t FROM '/home/alena/postgres/t2.csv' WITH (format
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);
WARNING: invalid input syntax for type bytea
WARNING: invalid input syntax for type bytea
WARNING: invalid hexadecimal data: odd number of digits
WARNING: 3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# select * from t;
id | b
----+----
5 | \x
5 | \x
(2 rows)
postgres=*# commit;
COMMIT
*I tried to use the similar test and moved transaction block in function:*
CREATE FUNCTION public.log2()
RETURNS void
LANGUAGE plpgsql
SECURITY DEFINER
AS $function$
BEGIN;
copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv',
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);
SAVEPOINT my_savepoint;
END;
$function$;
postgres=# delete from t;
postgres=# select 1 as t from log2();
WARNING: invalid input syntax for type bytea
WARNING: invalid input syntax for type bytea
WARNING: invalid hexadecimal data: odd number of digits
WARNING: 3 rows were skipped due to data type incompatibility
t
---
1
(1 row)
*Secondly I checked function copy with bytea datatype. *
/t1.csv consists:/
id,b
769,\x2d
1,\x6e
2,\x5c
5,\x
/And I ran it:/
postgres=# delete from t;
DELETE 4
postgres=# copy t FROM '/home/alena/postgres/t2.csv' WITH (format
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);
WARNING: invalid input syntax for type bytea
WARNING: invalid input syntax for type bytea
WARNING: invalid hexadecimal data: odd number of digits
WARNING: 3 rows were skipped due to data type incompatibility
COPY 1
postgres=# select * from t;
id | b
----+----
5 | \x
(1 row)
--
---
Alena Rybakina
Postgres Professional
On 2023-05-07 05:05, Alena Rybakina wrote:
Thanks for your reviewing and comments!
I noticed that you used _ignore_datatype_errors_specified_ variable in
_copy.c_ , but guc has a short name _ignore_datatype_errors_. Also you
used the short variable name in _CopyFormatOptions_ structure.
You may already understand it, but these variable names are given in
imitation of FREEZE and BINARY cases:
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype
errors */
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
Name used _ignore_datatype_errors_specified _is seemed very long to
me, may be use a short version of it (_ignore_datatype_errors_) in
_copy.c_ too?
I think it would be sane to align the names with the FREEZE and BINARY
options.
I agree with the name is too long and we once used the name
'ignore_errors'.
However, current implementation does not ignore all errors but just data
type error, so I renamed it.
There may be a better name, but I haven't come up with one.
Besides, I noticed that you used _ignored_errors_ variable in
_CopyFromStateData_ structure and it's name is strikingly similar to
name (_ignore_datatype_error__s_), but they have different meanings.
Maybe it will be better to rename it as _ignored_errors_counter_?
As far as I take a quick look at on PostgreSQL source code, there're few
variable name with "_counter". It seems to be used for function names.
Something like "ignored_errors_count" might be better.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Since v5 patch failed applying anymore, updated the patch.
On 2023-03-23 02:50, Andres Freund wrote:
I suggest adding a few more tests:
- COPY with a datatype error that can't be handled as a soft error
I didn't know proper way to test this, but I've found data type widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v6-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-diff; name=v6-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From de00c1555e0ee4a61565346946f4f3a4e851252c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 21 Aug 2023 20:30:29 +0900
Subject: [PATCH v6] Add new COPY option IGNORE_DATATYPE_ERRORS
Currently entire COPY fails even when there is one unexpected data
regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY
data which don't contain problem.
This patch uses the soft error handling infrastructure, which is
introduced by d9f7f5d32f20.
Author: Damir Belyalov, Atsushi Torikoshi
---
doc/src/sgml/ref/copy.sgml | 13 +++++++++
src/backend/commands/copy.c | 13 +++++++++
src/backend/commands/copyfrom.c | 37 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 19 +++++++++---
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 ++
src/test/regress/expected/copy2.out | 28 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 27 +++++++++++++++++
9 files changed, 139 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -370,6 +371,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ This option is not allowed when using binary format. Note that this
+ is only supported in current <command>COPY</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->ignore_datatype_errors)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b47cb5c66d..853adb8414 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
+ /* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors &&
+ cstate->ignored_error_count > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->ignored_error_count));
break;
+ }
+
+ /* Soft error occured, skip this tuple and log the reason */
+ if (cstate->escontext.error_occurred)
+ {
+ ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+ /* Adjust elevel so we don't jump out */
+ cstate->escontext.error_data->elevel = WARNING;
+
+ /*
+ * Despite the name, this won't raise an error since elevel is
+ * WARNING now.
+ */
+ ThrowErrorData(cstate->escontext.error_data);
+
+ ExecClearTuple(myslot);
+
+ new_escontext.details_wanted = true;
+ cstate->escontext = new_escontext;
+
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 232768a6e1..e44b555f48 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /*
+ * If IGNORE_DATATYPE_ERRORS is enabled, skip rows with
+ * datatype errors. */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_error_count++;
+
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 779fdc90cb..2fba51f648 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2869,7 +2869,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 33175868f6..c2e55ac21f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ac2c16f8b8..d5801df06c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_error_count; /* total number of ignored errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index faf1a4d1b0..ac9c99f083 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, ignore_datatype_errors);
+ERROR: cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY force quote available only in CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -666,6 +668,30 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -680,6 +706,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index d759635068..7c6a7abe42 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,6 +70,7 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, ignore_datatype_errors);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
@@ -464,6 +465,30 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -478,6 +503,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
2.39.2
Since v5 patch failed applying anymore, updated the patch.
Thank you for updating the patch . I made a little review on it where
corrected some formatting.
- COPY with a datatype error that can't be handled as a soft error
I didn't know proper way to test this, but I've found data type widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.
This test seems to be weird a bit, because of the "widget" type. The hard
error is thrown by the previous test with missing data. Also it'll be
interesting for me to list all cases when a hard error can be thrown.
Regards,
Damir Belyalov
Postgres Professional
Attachments:
v7-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov <d.belyalov@postgrespro.ru>
Date: Fri, 15 Sep 2023 11:14:57 +0300
Subject: [PATCH] ignore errors
---
doc/src/sgml/ref/copy.sgml | 13 +++++++++
src/backend/commands/copy.c | 13 +++++++++
src/backend/commands/copyfrom.c | 37 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 20 ++++++++++---
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 ++
src/test/regress/expected/copy2.out | 28 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 26 +++++++++++++++++
9 files changed, 139 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -370,6 +371,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ This option is not allowed when using binary format. Note that this
+ is only supported in current <command>COPY</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->ignore_datatype_errors)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..b18aea6376 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
+ /* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors &&
+ cstate->ignored_errors_count > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->ignored_errors_count));
break;
+ }
+
+ /* Soft error occured, skip this tuple and log the reason */
+ if (cstate->escontext.error_occurred)
+ {
+ ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+ /* Adjust elevel so we don't jump out */
+ cstate->escontext.error_data->elevel = WARNING;
+
+ /*
+ * Despite the name, this won't raise an error since elevel is
+ * WARNING now.
+ */
+ ThrowErrorData(cstate->escontext.error_data);
+
+ ExecClearTuple(myslot);
+
+ new_escontext.details_wanted = true;
+ cstate->escontext = new_escontext;
+
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f553734582..cf4dad1106 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /*
+ * If IGNORE_DATATYPE_ERRORS is enabled, skip rows with
+ * datatype errors.
+ */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors_count++;
+
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 779fdc90cb..2fba51f648 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2869,7 +2869,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 33175868f6..c2e55ac21f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ac2c16f8b8..e5bdae2d25 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_errors_count; /* total number of ignored errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index faf1a4d1b0..ac9c99f083 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, ignore_datatype_errors);
+ERROR: cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY force quote available only in CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -666,6 +668,30 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -680,6 +706,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index d759635068..e8c2c1aca3 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,6 +70,7 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, ignore_datatype_errors);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
@@ -464,6 +465,29 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -478,6 +502,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
2.34.1
On 2023-09-15 19:02, Damir Belyalov wrote:
Since v5 patch failed applying anymore, updated the patch.
Thank you for updating the patch . I made a little review on it where
corrected some formatting.
Thanks for your review and update!
I don't have objections the modification of the codes and comments.
Although v7 patch doesn't have commit messages on the patch, I think
leave commit message is good for reviewers.
- COPY with a datatype error that can't be handled as a soft error
I didn't know proper way to test this, but I've found data type
widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.This test seems to be weird a bit, because of the "widget" type. The
hard error is thrown by the previous test with missing data. Also
it'll be interesting for me to list all cases when a hard error can be
thrown.
Although missing data error is hard error, the suggestion from Andres
was adding `dataype` error:
- COPY with a datatype error that can't be handled as a soft error
As described in widghet_in(), widget is intentionally left emitting hard
error for testing purpose:
* Note: DON'T convert this error to "soft" style (errsave/ereturn).
We
* want this data type to stay permanently in the hard-error world so
that
* it can be used for testing that such cases still work reasonably.
From this point of view, I think this is a supposed way of using widget.
OTOH widget is declared in create_type.sql and I'm not sure it's ok to
use it in another test copy2.sql.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Although v7 patch doesn't have commit messages on the patch, I think
leave commit message is good for reviewers.
Sure, didn't notice it. Added the commit message to the updated patch.
* Note: DON'T convert this error to "soft" style (errsave/ereturn). We
* want this data type to stay permanently in the hard-error world
so that
* it can be used for testing that such cases still work reasonably.From this point of view, I think this is a supposed way of using widget.
I agree, it's a good approach for checking datatype errors, because
that's what was intended.
OTOH widget is declared in create_type.sql and I'm not sure it's ok to
use it in another test copy2.sql.
I think that other regress tests with 'widget' type that will be created
in the future can be not only in the create_type.sql. So it's not a
problem that some type or function is taken from another regress test.
For example, the table 'onek' is used in many regress tests.
Regards,
Damir Belyalov
Postgres Professional
Attachments:
v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchtext/x-patch; charset=UTF-8; name=v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patchDownload
From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov <dam.bel07@gmail.com>
Date: Fri, 15 Sep 2023 11:14:57 +0300
Subject: [PATCH v7] Add new COPY option IGNORE_DATATYPE_ERRORS
Currently entire COPY fails even when there is one unexpected data
regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY
data which don't contain problem.
This patch uses the soft error handling infrastructure, which is
introduced by d9f7f5d32f20.
Author: Damir Belyalov, Atsushi Torikoshi
---
doc/src/sgml/ref/copy.sgml | 13 +++++++++
src/backend/commands/copy.c | 13 +++++++++
src/backend/commands/copyfrom.c | 37 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 20 ++++++++++---
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 ++
src/test/regress/expected/copy2.out | 28 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 26 +++++++++++++++++
9 files changed, 139 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
+ IGNORE_DATATYPE_ERRORS [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -370,6 +371,18 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>IGNORE_DATATYPE_ERRORS</literal></term>
+ <listitem>
+ <para>
+ Drops rows that contain malformed data while copying. These are rows
+ with columns where the data type's input-function raises an error.
+ This option is not allowed when using binary format. Note that this
+ is only supported in current <command>COPY</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool ignore_datatype_errors_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+ {
+ if (ignore_datatype_errors_specified)
+ errorConflictingDefElem(defel, pstate);
+ ignore_datatype_errors_specified = true;
+ opts_out->ignore_datatype_errors = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->ignore_datatype_errors)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..b18aea6376 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
+ /* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+ if (cstate->opts.ignore_datatype_errors)
+ {
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
/* Directly store the values/nulls array in the slot */
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ {
+ if (cstate->opts.ignore_datatype_errors &&
+ cstate->ignored_errors_count > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->ignored_errors_count));
break;
+ }
+
+ /* Soft error occured, skip this tuple and log the reason */
+ if (cstate->escontext.error_occurred)
+ {
+ ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+ /* Adjust elevel so we don't jump out */
+ cstate->escontext.error_data->elevel = WARNING;
+
+ /*
+ * Despite the name, this won't raise an error since elevel is
+ * WARNING now.
+ */
+ ThrowErrorData(cstate->escontext.error_data);
+
+ ExecClearTuple(myslot);
+
+ new_escontext.details_wanted = true;
+ cstate->escontext = new_escontext;
+
+ continue;
+ }
ExecStoreVirtualTuple(myslot);
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f553734582..cf4dad1106 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /*
+ * If IGNORE_DATATYPE_ERRORS is enabled, skip rows with
+ * datatype errors.
+ */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->ignored_errors_count++;
+
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 779fdc90cb..2fba51f648 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2869,7 +2869,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "IGNORE_DATATYPE_ERRORS");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 33175868f6..c2e55ac21f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
* -1 if not specified */
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
+ bool ignore_datatype_errors; /* ignore rows with datatype errors */
bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ac2c16f8b8..e5bdae2d25 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ int64 ignored_errors_count; /* total number of ignored errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index faf1a4d1b0..ac9c99f083 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, ignore_datatype_errors);
+ERROR: cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY force quote available only in CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -666,6 +668,30 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+WARNING: invalid input syntax for type integer: "a"
+WARNING: value "3333333333" is out of range for type integer
+WARNING: invalid input syntax for type integer: "a"
+WARNING: invalid input syntax for type integer: ""
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -680,6 +706,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index d759635068..e8c2c1aca3 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,6 +70,7 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, ignore_datatype_errors);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
@@ -464,6 +465,29 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for IGNORE_DATATYPE_ERRORS option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (IGNORE_DATATYPE_ERRORS);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -478,6 +502,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
2.34.1
Damir <dam.bel07@gmail.com> writes:
[ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]
Sorry for being so late to the party, but ... I don't think this
is a well-designed feature as it stands. Simply dropping failed rows
seems like an unusable definition for any application that has
pretensions of robustness. "But", you say, "we're emitting WARNING
messages about it". That's *useless*. For most applications WARNING
messages just go into the bit bucket, or worse they cause memory leaks
(because the app never reads them). An app that tried to read them
would have to cope with all sorts of fun such as translated messages.
Furthermore, as best I can tell from the provided test cases, the
messages completely lack basic context such as which field or line
the problem occurred in. An app trying to use this to understand
which input lines had failed would not get far.
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.
Also it'd be a good idea to have a vision of how the feature
could be extended to cope with lower-level errors, such as
lines that have the wrong number of columns or other problems
with line-level syntax. I don't say we need to cope with that
immediately, but it's going to be something people will want
to add, I think.
regards, tom lane
On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.
I agree that the errors should be easily visible to the user in some way. The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.
As a data point, Greenplum has this feature with additional SQL syntax to
control it:
COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;
LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
operation errors out. I'm not at all advocating that we should mimic this,
just wanted to add a reference to postgres derivative where this has been
implemented.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.
I agree that the errors should be easily visible to the user in some way. The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.
I agree it's interesting, but we need to get it right the first time.
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something like
ERRORS TO other_table_name (item [, item [, ...]])
where the "items" are keywords identifying the information item
we will insert into each successive column of the target table.
This design allows the user to decide which items are of use
to them. I envision items like
LINENO bigint COPY line number, counting from 1
LINE text raw text of line (after encoding conversion)
FIELDS text[] separated, de-escaped string fields (the data
that was or would be fed to input functions)
FIELD text name of troublesome field, if field-specific
MESSAGE text error message text
DETAIL text error message detail, if any
SQLSTATE text error SQLSTATE code
Some of these would have to be populated as NULL if we didn't get
that far in processing the line. In the worst case, which is
encoding conversion failure, I think we couldn't populate any of
the data items except LINENO.
Not sure if we need to insist that the target table columns be
exactly the data types I show above. It'd be nice to allow
the LINENO target to be plain int, perhaps. OTOH, do we really
want to have to deal with issues like conversion failures while
trying to report an error?
As a data point, Greenplum has this feature with additional SQL syntax to
control it:
COPY .. LOG ERRORS SEGMENT REJECT LIMIT xyz ROWS;
LOG ERRORS instructs the database to log the faulty rows and SEGMENT REJECT
LIMIT xyz ROWS sets the limit of how many rows can be faulty before the
operation errors out. I'm not at all advocating that we should mimic this,
just wanted to add a reference to postgres derivative where this has been
implemented.
Hm. A "reject limit" might be a useful add-on, but I wouldn't advocate
including it in the initial patch.
regards, tom lane
Hi,
On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
Damir <dam.bel07@gmail.com> writes:
[ v7-0002-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch ]
Sorry for being so late to the party, but ... I don't think this
is a well-designed feature as it stands. Simply dropping failed rows
seems like an unusable definition for any application that has
pretensions of robustness.
Not everything needs to be a robust application though. I've definitely cursed
at postgres for lacking this.
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.
If we go in that direction, we should make it possible to *not* use such a
table as well, for some uses it'd be pointless.
Another way of reporting errors could be for copy to return invalid input back
to the client, via the copy protocol. That would allow the client to handle
failing rows and also to abort if the number of errors or the type of errors
gets to be too big.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line.
If we go in that direction, we should make it possible to *not* use such a
table as well, for some uses it'd be pointless.
Why? You can always just drop the errors table if you don't want it.
But I fail to see the use-case for ignoring errors altogether.
Another way of reporting errors could be for copy to return invalid input back
to the client, via the copy protocol.
Color me skeptical. There are approximately zero clients in the
world today that could handle simultaneous return of data during
a COPY. Certainly neither libpq nor psql are within hailing
distance of being able to support that. Maybe in some far
future it could be made to work --- but if you want it in the v1
patch, you just moved the goalposts into the next county.
regards, tom lane
Hi,
On 2023-11-08 19:00:01 -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2023-11-08 13:18:39 -0500, Tom Lane wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line.If we go in that direction, we should make it possible to *not* use such a
table as well, for some uses it'd be pointless.Why? You can always just drop the errors table if you don't want it.
I think it'll often just end up littering the database, particularly if the
callers don't care about a few errors.
But I fail to see the use-case for ignoring errors altogether.
My experience is that there's often a few errors due to bad encoding, missing
escaping etc that you don't care sufficiently about when importing large
quantities of data.
Greetings,
Andres Freund
Hello everyone!
Thanks for turning back to this patch.
I had already thought about storing errors in the table / separate file /
logfile and it seems to me that the best way is to output errors in
logfile. As for user it is more convenient to look for errors in the place
where they are usually generated - in logfile and if he wants to intercept
them he could easily do that by few commands.
The analogues of this feature in other DBSM usually had additional files
for storing errors, but their features had too many options (see attached
files).
I also think that the best way is to simplify this feature for the first
version and don't use redundant adjustments such as additional files and
other options.
IMHO for more complicated operations with loading tables files pgloader
exists: https://github.com/dimitri/pgloader
Links of analogues of COPY IGNORE_DATATYPE_ERRORS
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html
Regards,
Damir Belyalov
Postgres Professional
Tom Lane <tgl@sss.pgh.pa.us> writes:
Daniel Gustafsson <daniel@yesql.se> writes:
On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.I agree that the errors should be easily visible to the user in some way. The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.I agree it's interesting, but we need to get it right the first time.
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
where the "items" are keywords identifying the information item
we will insert into each successive column of the target table.
This design allows the user to decide which items are of use
to them. I envision items like
While I'm pretty happy with the overall design, which is 'ERRORS to
other_table_name' specially. I'm a bit confused why do we need to
write the codes for (item [, item [, ...]]), not only because it
requires more coding but also requires user to make more decisions.
will it be anything wrong to make all of them as default?
LINENO bigint COPY line number, counting from 1
LINE text raw text of line (after encoding conversion)
FIELDS text[] separated, de-escaped string fields (the data
that was or would be fed to input functions)
FIELD text name of troublesome field, if field-specific
MESSAGE text error message text
DETAIL text error message detail, if any
SQLSTATE text error SQLSTATE code
--
Best Regards
Andy Fan
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
I tried to implement the patch using a table and came across a number of
questions.
Which table should we implement for this feature: a system catalog table or
store this table as a file or create a new table?
In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user. It
is also not clear how access rights to this table are inherited and be
given.
--
Regards,
Damir Belyalov
Postgres Professional
Hi!
On 14.11.2023 13:10, Damir Belyalov wrote:
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
I tried to implement the patch using a table and came across a number
of questions.Which table should we implement for this feature: a system catalog
table or store this table as a file or create a new table?In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user.
It is also not clear how access rights to this table are inherited and
be given.
Maybe we can add a guc or a parameter to output such errors during the
execution of the copy function with errors and check whether the user
has enough rights to set such a parameter?
That is, I propose to give the user a choice to run copy with and
without saving errors and at the same time immediately check whether the
option with error output is possible for him in principle?
--
Regards,
Alena Rybakina
Damir Belyalov <dam.bel07@gmail.com> writes:
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
I tried to implement the patch using a table and came across a number of questions.
Which table should we implement for this feature: a system catalog table or store this table as a file or create a new
table?
I think system catalog should not be a option at the first place since
it requires more extra workload to do. see the calls of
IsCatalogRelation in heapam.c.
I prefer to create a new normal heap relation rather than a file since
heap realtion probabaly have better APIs.
In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another
user. It is also not clear how access rights to this
table are inherited and be given.
How about creating the table just allowing the current user to
read/write or just same as the relation we are copying to?
--
Best Regards
Andy Fan
On Thu, Nov 9, 2023 at 4:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 8 Nov 2023, at 19:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think an actually usable feature of this sort would involve
copying all the failed lines to some alternate output medium,
perhaps a second table with a TEXT column to receive the original
data line. (Or maybe an array of text that could receive the
broken-down field values?) Maybe we could dump the message info,
line number, field name etc into additional columns.I agree that the errors should be easily visible to the user in some way. The
feature is for sure interesting, especially in data warehouse type jobs where
dirty data is often ingested.I agree it's interesting, but we need to get it right the first time.
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
where the "items" are keywords identifying the information item
we will insert into each successive column of the target table.
This design allows the user to decide which items are of use
to them. I envision items likeLINENO bigint COPY line number, counting from 1
LINE text raw text of line (after encoding conversion)
FIELDS text[] separated, de-escaped string fields (the data
that was or would be fed to input functions)
FIELD text name of troublesome field, if field-specific
MESSAGE text error message text
DETAIL text error message detail, if any
SQLSTATE text error SQLSTATE code
just
SAVE ERRORS
automatically create a table to hold the error. (validate
auto-generated table name uniqueness, validate create privilege).
and the table will have the above related info. if no error then table
gets dropped.
On 14/11/2023 17:10, Damir Belyalov wrote:
Here is a very straw-man-level sketch of what I think might work.
The option to COPY FROM looks something likeERRORS TO other_table_name (item [, item [, ...]])
I tried to implement the patch using a table and came across a number of
questions.Which table should we implement for this feature: a system catalog table
or store this table as a file or create a new table?In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user.
It is also not clear how access rights to this table are inherited and
be given.
Previous reviews have given helpful ideas about storing errors in the
new table.
It should be trivial code - use the current table name + 'err' + suffix
as we already do in the case of conflicting auto-generated index names.
The 'errors table' must inherit any right policies from the table, to
which we do the copy.
--
regards,
Andrei Lepikhov
Postgres Professional
hi.
here is my implementation based on previous discussions
add a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}
overall logic is:
if save_error is specified then
if error_holding table not exists then create one
if error_holding table exists set error_firsttime to false.
if save_error is not specified then work as master branch.
if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%s
error holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".
error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
the following field is not implemented.
FIELDS text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)
because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;
I feel it's hard from textual '\D' to get text[] `(11,test)` via SPI.
--------------------------------------
demo:
create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.
NOTICE: 3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select * from copy_default_error_save_error;
lineno | line | field | source
| err_message |
err_detail | errorcode
--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
1 | k value '2022-07-04' | id | k
| invalid input syntax for type integer: "k" |
| 22P02
2 | z \D '2022-07-03ASKL' | id | z
| invalid input syntax for type integer: "z" |
| 22P02
2 | z \D '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" | | 22007
3 | s \D \D | id | s
| invalid input syntax for type integer: "s" |
| 22P02
(4 rows)
The doc is not so good.
COPY FROM (save_error), it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...
The main code is about constructing SPI query, and test and test output.
Attachments:
v8-0001-Add-a-new-COPY-option-SAVE_ERROR.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Add-a-new-COPY-option-SAVE_ERROR.patchDownload
From 7aeb55cb0c8b1b36fd5c468fee0b07d4c13d1a7d Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Sun, 3 Dec 2023 22:58:40 +0800
Subject: [PATCH v8 1/1] Add a new COPY option: SAVE_ERROR. Only works for COPY
FROM, non-BINARY mode.
Currently NextCopyFrom can have 3 errors reported.
* extra data after last expected column
* missing data for column \"%s\"
* main function InputFunctionCall inside error.
Currently, we only deal with InputFunctionCall errors only.
instead of throw error while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the condition then errors will save to that table.
While copying, if error never happened, error save table will be dropped at the ending of COPY.
If the error saving table already exists,
meaning at least once COPY FROM errors had happened,
then all the future error will save to that table.
---
contrib/file_fdw/file_fdw.c | 4 +-
doc/src/sgml/ref/copy.sgml | 88 +++++++++++++
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 151 ++++++++++++++++++++++-
src/backend/commands/copyfromparse.c | 89 ++++++++++++-
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 7 ++
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 132 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 99 +++++++++++++++
12 files changed, 585 insertions(+), 12 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
+ slot->tts_values, slot->tts_isnull, NULL);
if (found)
ExecStoreVirtualTuple(slot);
@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
MemoryContextReset(tupcontext);
MemoryContextSwitchTo(tupcontext);
- found = NextCopyFrom(cstate, NULL, values, nulls);
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
MemoryContextSwitchTo(oldcontext);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..06096fa6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,17 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies malformed data make data type conversion failure while copying will automatically report error information to a regualar table.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -572,6 +584,13 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ if <literal>SAVE_ERROR</literal> spceicfied, error actually happened then
+ <productname>PostgreSQL</productname> will create one table for you, if no error happened
+ error_table not exist, nothing will happed.
+
+ </para>
+
</refsect1>
<refsect1>
@@ -962,6 +981,75 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title>Error Save Table </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> spceicfied, all the data type conversion fail while copying will automatically saved in a regular table.
+ <xref linkend="copy-errorsave-table"/> shows the error save table name, data type, and description.
+ </para>
+
+ <table id="copy-errorsave-table">
+
+ <title>COPY ERROR SAVE TABLE </title>
+
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry> Raw content of error occuring line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>field</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry> Field name of the error occuring </entry>
+ </row>
+
+ <row>
+ <entry> <literal>source</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry> Raw content of the error occuring field </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry> Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry> The error code for the copying error <literal>*</literal> </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..acd5b623 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -38,6 +38,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -652,10 +653,12 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ StringInfo err_save_buf;
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -952,6 +955,7 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ err_save_buf = makeStringInfo();
for (;;)
{
TupleTableSlot *myslot;
@@ -989,8 +993,54 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
/* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull, err_save_buf))
+ {
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->error_nsp && cstate->error_rel);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%ld rows were skipped because of error."
+ " skipped row saved to table %s.%s",
+ cstate->error_rows_cnt,
+ cstate->error_nsp, cstate->error_rel));
+ }
+ else
+ {
+ StringInfoData querybuf;
+ if (cstate->error_firsttime)
+ {
+ ereport(NOTICE,
+ errmsg("No error happened."
+ "Error holding table %s.%s will be droped",
+ cstate->error_nsp, cstate->error_rel));
+ initStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "DROP TABLE IF EXISTS %s.%s CASCADE ",
+ cstate->error_nsp, cstate->error_rel);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+ }
+ else
+ ereport(NOTICE,
+ errmsg("No error happened. "
+ "All the past error holding saved at %s.%s ",
+ cstate->error_nsp, cstate->error_rel));
+ }
+ }
break;
+ }
+
+ /* Soft error occured, skip this tuple */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
ExecStoreVirtualTuple(myslot);
@@ -1444,6 +1494,103 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ char *err_nsp;
+ char error_rel[NAMEDATALEN];
+ StringInfoData querybuf;
+ bool isnull;
+ bool error_table_ok;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ snprintf(error_rel, sizeof(error_rel), "%s",
+ RelationGetRelationName(cstate->rel));
+ strlcat(error_rel,"_error", NAMEDATALEN);
+ err_nsp = get_namespace_name(RelationGetNamespace(cstate->rel));
+
+ initStringInfo(&querybuf);
+ /* The build query is used to validate:
+ * . err_nsp.error_rel table exists
+ * . column list(order by attnum, begin from ctid) =
+ * {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
+ * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}
+ * We need ctid system column when
+ * save_error table already exists and have zero column.
+ *
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,lineno,line,field,source,err_message,err_detail,errorcode}') AND "
+ "(array_agg(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+
+ appendStringInfo(&querybuf,
+ "relname = $$%s$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ error_rel, err_nsp);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ error_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ /* no err_nsp.error_rel table then crete one. for holding error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.%s (LINENO BIGINT, LINE TEXT, "
+ "FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, "
+ "ERR_DETAIL TEXT, ERRORCODE TEXT)",
+ err_nsp,error_rel);
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ cstate->error_firsttime = true;
+ elog(DEBUG1, "%s.%s created ", err_nsp, error_rel);
+ }
+ else if (error_table_ok)
+ /* error save table already exists. Set error_firsttime to false */
+ cstate->error_firsttime = false;
+ else if(!error_table_ok)
+ ereport(ERROR,
+ (errmsg("Error save table %s.%s already exists. "
+ "Cannot use it for COPY FROM error saving",
+ err_nsp, error_rel)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* these info need, no error will drop err_nsp.error_rel table */
+ cstate->error_rel = pstrdup(error_rel);
+ cstate->error_nsp = err_nsp;
+ }
+ else
+ {
+ /* set to NULL */
+ cstate->error_rel = NULL;
+ cstate->error_nsp = NULL;
+ cstate->escontext = NULL;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..5b5471af 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -66,10 +66,12 @@
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -852,7 +854,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
*/
bool
NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+ Datum *values, bool *nulls, StringInfo err_save_buf)
{
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
@@ -885,6 +887,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ /* reset to false for next new line if SAVE_ERROR specified */
+ if (cstate->opts.save_error)
+ {
+ cstate->line_error_occured = false;
+ }
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
ereport(ERROR,
@@ -956,15 +963,87 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ * So there is two function.
+ */
+ if(!cstate->opts.save_error)
+ {
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ }
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char errcode[12];
+ char *err_detail;
+ snprintf(errcode, sizeof(errcode),
+ "%s",
+ unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ resetStringInfo(err_save_buf);
+ /* error table first column is bigint, reset is text.*/
+ appendStringInfo(err_save_buf,
+ "insert into %s.%s(lineno,line,field, "
+ "source, err_message, errorcode,err_detail) "
+ "select $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$, $$%s$$, ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->cur_lineno, cstate->line_buf.data,
+ cstate->cur_attname, string,
+ cstate->escontext->error_data->message,
+ errcode);
+
+ if (!err_detail)
+ appendStringInfo(err_save_buf, "NULL::text");
+ else
+ appendStringInfo(err_save_buf,"$$%s$$", err_detail);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+ if (SPI_processed != 1)
+ elog(FATAL, "not a singleton result");
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89..747bd88a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVE_ERROR SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17328,6 +17332,7 @@ unreserved_keyword:
| ROUTINES
| ROWS
| RULE
+ | SAVE_ERROR
| SAVEPOINT
| SCALAR
| SCHEMA
@@ -17936,6 +17941,7 @@ bare_label_keyword:
| ROW
| ROWS
| RULE
+ | SAVE_ERROR
| SAVEPOINT
| SCALAR
| SCHEMA
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..cfed5d7f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to another table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
@@ -82,7 +83,7 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, StringInfo err_save_buf);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..b1c02b2f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,12 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ int64 error_rows_cnt; /* total number of rows that have errors */
+ const char *error_rel; /* the error row save table name */
+ const char *error_nsp; /* the error row table's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
+ bool error_firsttime; /* first time create error save table */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..0906cc40 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,113 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+NOTICE: No error happened.Error holding table public.save_error_csv_error will be droped
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+ expected_zero
+---------------
+ 0
+(1 row)
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+create table save_error_csv_error();
+--should fail. since error save table already exists.
+--error save table name = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: Error save table public.save_error_csv_error already exists. Cannot use it for COPY FROM error saving
+DROP TABLE save_error_csv_error;
+BEGIN;
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+SELECT *, b is null as b_null, b = '' as empty FROM save_error_csv;
+ a | b | c | d | b_null | empty
+---+---+------+------+--------+-------
+ 2 | | NULL | NULL | f | t
+(1 row)
+
+SELECT count(*) as expect_one FROM pg_class WHERE relname = 'save_error_csv_error';
+ expect_one
+------------
+ 1
+(1 row)
+
+ROLLBACK;
+DROP TABLE save_error_csv;
+--error TABLE should already droppped.
+SELECT 1 as expect_zero FROM pg_class WHERE relname = 'save_error_csv_error';
+ expect_zero
+-------------
+(0 rows)
+
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 8 rows were skipped because of error. skipped row saved to table public.check_ign_err_error
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+NOTICE: No error happened. All the past error holding saved at public.check_ign_err_error
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+ lineno | line | field | source | err_message | err_detail | errorcode
+--------+--------------------------------------------+-------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ 2 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | " | |
+ 3 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ 4 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ 5 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ 6 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ 6 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ 7 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ 7 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ 7 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ 8 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ 8 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ 9 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ 9 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ 9 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(14 rows)
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY check_ign_err FROM STDIN WITH (save_error, save_error o...
+ ^
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 4 rows were skipped because of error. skipped row saved to table public.textrange_input_error
+SELECT * FROM textrange_input_error;
+ lineno | line | field | source | err_message | err_detail | errorcode
+--------+----------------------------+-------+----------+-------------------------------------------------------------------+------------------------------------------+-----------
+ 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ 2 | (",a),(",",a),()",a) | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ 2 | (",a),(",",a),()",a) | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ 2 | (",a),(",",a),()",a) | c | a) | malformed range literal: "a)" | Missing left parenthesis or bracket. | 22P02
+ 3 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ 3 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ 3 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ 4 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ 4 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ 4 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +929,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of error. skipped row saved to table public.copy_default_error_save_error
+select count(*) as expect_zero from copy_default_error_save;
+ expect_zero
+-------------
+ 0
+(1 row)
+
+select * from copy_default_error_save_error;
+ lineno | line | field | source | err_message | err_detail | errorcode
+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..3f8137cf 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,89 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+\.
+
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+create table save_error_csv_error();
+--should fail. since error save table already exists.
+--error save table name = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+DROP TABLE save_error_csv_error;
+
+BEGIN;
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as empty FROM save_error_csv;
+SELECT count(*) as expect_one FROM pg_class WHERE relname = 'save_error_csv_error';
+ROLLBACK;
+
+DROP TABLE save_error_csv;
+
+--error TABLE should already droppped.
+SELECT 1 as expect_zero FROM pg_class WHERE relname = 'save_error_csv_error';
+
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+,,,
+\.
+
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a)
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT * FROM textrange_input_error;
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
+
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +692,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+select count(*) as expect_zero from copy_default_error_save;
+select * from copy_default_error_save_error;
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
Hi!
Thank you for your contribution to this thread.
On 04.12.2023 05:23, jian he wrote:
hi.
here is my implementation based on previous discussionsadd a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}overall logic is:
if save_error is specified then
if error_holding table not exists then create one
if error_holding table exists set error_firsttime to false.
if save_error is not specified then work as master branch.if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%serror holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);the following field is not implemented.
FIELDS text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;I feel it's hard from textual '\D' to get text[] `(11,test)` via SPI.
--------------------------------------
demo:create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.NOTICE: 3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select * from copy_default_error_save_error;
lineno | line | field | source
| err_message |
err_detail | errorcode
--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
1 | k value '2022-07-04' | id | k
| invalid input syntax for type integer: "k" |
| 22P02
2 | z \D '2022-07-03ASKL' | id | z
| invalid input syntax for type integer: "z" |
| 22P02
2 | z \D '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" | | 22007
3 | s \D \D | id | s
| invalid input syntax for type integer: "s" |
| 22P02
(4 rows)The doc is not so good.
COPY FROM (save_error), it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...The main code is about constructing SPI query, and test and test output.
I reviewed it and have a few questions.
1. I have seen that you delete a table before creating it, to which you
want to add errors due to a failed "copy from" operation. I think this
is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can
just add some number at the end of the name, such as name_table1,
name_table_2.
2. I noticed that you are forming a table name using the type of errors
that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was
used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.
3. I found spelling:
/* no err_nsp.error_rel table then crete one. for holding error. */
4. Maybe rewrite this comment
these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the
err_sp.error_rel table.
5. Is this part of the comment needed? I think it duplicates the
information below when we form the query.
* . column list(order by attnum, begin from ctid) =
* {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
* . data types (from attnum = -1) ={tid,
int8,text,text,text,text,text,text}
I'm not sure if we need to order the rows by number. It might be easier
to work with these lines in the order they appear.
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
Hi!
Thank you for your contribution to this thread.
I reviewed it and have a few questions.
1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.
Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
Can you demo the expected behavior?
2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.
Another column added.
now it looks like:
SELECT * FROM save_error_csv_error;
filename | lineno | line
| field | source | err_message |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
STDIN | 1 | 2002 232 40 50 60 70
80 | NULL | NULL | extra data after last expected column |
NULL | 22P04
STDIN | 1 | 2000 230 23
| d | NULL | missing data for column "d" | NULL
| 22P04
STDIN | 1 | z,,""
| a | z | invalid input syntax for type integer: "z" | NULL
| 22P02
STDIN | 2 | \0,,
| a | \0 | invalid input syntax for type integer: "\0" | NULL
| 22P02
3. I found spelling:
/* no err_nsp.error_rel table then crete one. for holding error. */
fixed.
4. Maybe rewrite this comment
these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.
fixed.
5. Is this part of the comment needed? I think it duplicates the information below when we form the query.
* . column list(order by attnum, begin from ctid) =
* {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
* . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.
Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.
COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.
Attachments:
v9-0001-Make-COPY-FROM-more-error-tolerant.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 990e1e0f5130431cf32069963bb980bb0692ce0b Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Wed, 6 Dec 2023 18:26:32 +0800
Subject: [PATCH v9 1/1] Make COPY FROM more error tolerant
Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.
Only works for COPY FROM, non-BINARY mode.
While copying, if error never happened, error save table will be dropped at the ending of COPY FROM.
If the error saving table already exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
we save the error to error saving table using SPI, construct a query, then execute the query.
---
contrib/file_fdw/file_fdw.c | 4 +-
doc/src/sgml/ref/copy.sgml | 93 ++++++++++++
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 147 ++++++++++++++++++-
src/backend/commands/copyfromparse.c | 171 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 7 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 135 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 108 ++++++++++++++
12 files changed, 673 insertions(+), 19 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
+ slot->tts_values, slot->tts_isnull, NULL);
if (found)
ExecStoreVirtualTuple(slot);
@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
MemoryContextReset(tupcontext);
MemoryContextSwitchTo(tupcontext);
- found = NextCopyFrom(cstate, NULL, values, nulls);
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
MemoryContextSwitchTo(oldcontext);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..a6370c42 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,17 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion failure while copying will automatically report error information to a regular table.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -572,6 +584,12 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is spceified and a conversion error occurs while copying, then
+ <productname>PostgreSQL</productname> will create a table to save all the conversion errors. Conversion error
+ include data type conversion failure, extra data or missing data in the source file.
+ </para>
+
</refsect1>
<refsect1>
@@ -962,6 +980,81 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title>Error Save Table </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> spceicfied, all the data type conversion fail while copying will automatically saved in a regular table.
+ <xref linkend="copy-errorsave-table"/> shows the error save table name, data type, and description.
+ </para>
+
+ <table id="copy-errorsave-table">
+
+ <title>COPY ERROR SAVE TABLE </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the input file</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of error occuring line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>field</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field name of the error occuring</entry>
+ </row>
+
+ <row>
+ <entry> <literal>source</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occuring field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code for the copying error</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..ee6f2664 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -38,6 +38,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -652,10 +653,12 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ StringInfo err_save_buf;
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -952,6 +955,7 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ err_save_buf = makeStringInfo();
for (;;)
{
TupleTableSlot *myslot;
@@ -989,8 +993,54 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
/* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull, err_save_buf))
+ {
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->error_nsp && cstate->error_rel);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%ld rows were skipped because of error."
+ " skipped row saved to table %s.%s",
+ cstate->error_rows_cnt,
+ cstate->error_nsp, cstate->error_rel));
+ }
+ else
+ {
+ StringInfoData querybuf;
+ if (cstate->error_firsttime)
+ {
+ ereport(NOTICE,
+ errmsg("No error happened."
+ "Error holding table %s.%s will be droped",
+ cstate->error_nsp, cstate->error_rel));
+ initStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "DROP TABLE IF EXISTS %s.%s CASCADE ",
+ cstate->error_nsp, cstate->error_rel);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+ }
+ else
+ ereport(NOTICE,
+ errmsg("No error happened. "
+ "All the past error holding saved at %s.%s ",
+ cstate->error_nsp, cstate->error_rel));
+ }
+ }
break;
+ }
+
+ /* Soft error occured, skip this tuple */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
ExecStoreVirtualTuple(myslot);
@@ -1444,6 +1494,99 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ char *err_nsp;
+ char error_rel[NAMEDATALEN];
+ StringInfoData querybuf;
+ bool isnull;
+ bool error_table_ok;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ snprintf(error_rel, sizeof(error_rel), "%s",
+ RelationGetRelationName(cstate->rel));
+ strlcat(error_rel,"_error", NAMEDATALEN);
+ err_nsp = get_namespace_name(RelationGetNamespace(cstate->rel));
+
+ initStringInfo(&querybuf);
+ /* The build query is used to validate:
+ * Does err_nsp.error_rel table exist?
+ * if err_nsp.error_rel exists, does it meet our criteria?
+ * our criteria of error table is based on column name and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,filename,lineno,line,field,source,err_message,err_detail,errorcode}') AND "
+ "(array_agg(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+
+ appendStringInfo(&querybuf,
+ "relname = $$%s$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ error_rel, err_nsp);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ error_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ /* No err_nsp.error_rel table then create it for holding error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.%s (FILENAME TEXT, LINENO BIGINT, LINE TEXT, "
+ "FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, "
+ "ERR_DETAIL TEXT, ERRORCODE TEXT)",
+ err_nsp,error_rel);
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ cstate->error_firsttime = true;
+ elog(DEBUG1, "%s.%s created ", err_nsp, error_rel);
+ }
+ else if (error_table_ok)
+ /* error save table already exists. Set error_firsttime to false */
+ cstate->error_firsttime = false;
+ else if(!error_table_ok)
+ ereport(ERROR,
+ (errmsg("Error save table %s.%s already exists. "
+ "Cannot use it for COPY FROM error saving",
+ err_nsp, error_rel)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* thses information is necessary, no error then drop err_sp.error_rel table*/
+ cstate->error_rel = pstrdup(error_rel);
+ cstate->error_nsp = err_nsp;
+ }
+ else
+ {
+ /* set to NULL */
+ cstate->error_rel = NULL;
+ cstate->error_nsp = NULL;
+ cstate->escontext = NULL;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..d7ddf64c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -66,10 +66,12 @@
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -852,7 +854,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
*/
bool
NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+ Datum *values, bool *nulls, StringInfo err_save_buf)
{
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
@@ -885,11 +887,48 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ /* reset to false for next new line if SAVE_ERROR specified */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ char *errmsg_extra = "extra data after last expected column";
+
+ resetStringInfo(err_save_buf);
+ /* add line buf, etc for line have extra data to error save table*/
+ appendStringInfo(err_save_buf,
+ "insert into %s.%s(filename, lineno,line, "
+ "err_message, errorcode) "
+ "select $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ errmsg_extra,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +940,46 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ char errmsg[128];
+ snprintf(errmsg, sizeof(errmsg),
+ "missing data for column \"%s\"",
+ NameStr(att->attname));
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "insert into %s.%s(filename,lineno,line, field, "
+ "err_message, errorcode) "
+ "select $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$ ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ NameStr(att->attname), errmsg,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1031,87 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ * So there is two function.
+ */
+ if(!cstate->opts.save_error)
+ {
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ }
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char errcode[12];
+ char *err_detail;
+ snprintf(errcode, sizeof(errcode),
+ "%s",
+ unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ resetStringInfo(err_save_buf);
+ /* error table first column is bigint, reset is text.*/
+ appendStringInfo(err_save_buf,
+ "insert into %s.%s(filename, lineno,line,field, "
+ "source, err_message, errorcode,err_detail) "
+ "select $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$, $$%s$$, ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ cstate->cur_attname, string,
+ cstate->escontext->error_data->message,
+ errcode);
+
+ if (!err_detail)
+ appendStringInfo(err_save_buf, "NULL::text");
+ else
+ appendStringInfo(err_save_buf,"$$%s$$", err_detail);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89..747bd88a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVE_ERROR SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17328,6 +17332,7 @@ unreserved_keyword:
| ROUTINES
| ROWS
| RULE
+ | SAVE_ERROR
| SAVEPOINT
| SCALAR
| SCHEMA
@@ -17936,6 +17941,7 @@ bare_label_keyword:
| ROW
| ROWS
| RULE
+ | SAVE_ERROR
| SAVEPOINT
| SCALAR
| SCHEMA
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..de47791a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
@@ -82,7 +83,7 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, StringInfo err_save_buf);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..b1c02b2f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,12 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ int64 error_rows_cnt; /* total number of rows that have errors */
+ const char *error_rel; /* the error row save table name */
+ const char *error_nsp; /* the error row table's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
+ bool error_firsttime; /* first time create error save table */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..bb86bb9f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,116 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+NOTICE: No error happened.Error holding table public.save_error_csv_error will be droped
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+ expected_zero
+---------------
+ 0
+(1 row)
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error) already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: Error save table public.save_error_csv_error already exists. Cannot use it for COPY FROM error saving
+DROP TABLE save_error_csv_error;
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | d | b_null | b_empty
+---+---+------+------+--------+---------
+ 2 | | NULL | NULL | f | t
+(1 row)
+
+SELECT * FROM save_error_csv_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
+ STDIN | 1 | 2002 232 40 50 60 70 80 | NULL | NULL | extra data after last expected column | NULL | 22P04
+ STDIN | 1 | 2000 230 23 | d | NULL | missing data for column "d" | NULL | 22P04
+ STDIN | 1 | z,,"" | a | z | invalid input syntax for type integer: "z" | NULL | 22P02
+ STDIN | 2 | \0,, | a | \0 | invalid input syntax for type integer: "\0" | NULL | 22P02
+(4 rows)
+
+DROP TABLE save_error_csv, save_error_csv_error;
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 8 rows were skipped because of error. skipped row saved to table public.check_ign_err_error
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+NOTICE: No error happened. All the past error holding saved at public.check_ign_err_error
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+--------------------------------------------+-------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ STDIN | 2 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | " | |
+ STDIN | 3 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 4 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ STDIN | 5 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 9 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(14 rows)
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY check_ign_err FROM STDIN WITH (save_error, save_error o...
+ ^
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 4 rows were skipped because of error. skipped row saved to table public.textrange_input_error
+SELECT * FROM textrange_input_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------+-------+----------+-------------------------------------------------------------------+------------------------------------------+-----------
+ STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 2 | (",a),(",",a),()",a) | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | c | a) | malformed range literal: "a)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 4 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +932,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of error. skipped row saved to table public.copy_default_error_save_error
+select count(*) as expect_zero from copy_default_error_save;
+ expect_zero
+-------------
+ 0
+(1 row)
+
+select * from copy_default_error_save_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..8c8d8adb 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,98 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+\.
+
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error) already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+DROP TABLE save_error_csv_error;
+
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+2002 232 40 50 60 70 80
+\.
+
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+2000 230 23
+\.
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+
+SELECT * FROM save_error_csv_error;
+
+DROP TABLE save_error_csv, save_error_csv_error;
+
+
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+,,,
+\.
+
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a)
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT * FROM textrange_input_error;
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
+
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +701,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+select count(*) as expect_zero from copy_default_error_save;
+select * from copy_default_error_save_error;
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
Thank you for your work. Unfortunately, your code contained errors
during the make installation:
'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2
I have ubuntu 22.04 operation system.
On 06.12.2023 13:47, jian he wrote:
On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina<lena.ribackina@yandex.ru> wrote:
Hi!
Thank you for your contribution to this thread.
I reviewed it and have a few questions.
1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
To be honest, first of all, I misunderstood this part of the code. Now I
see that it works the way you mentioned.
However, I didn't see if you dealt with cases where we already had a
table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never
be able to face such a problem?
Can you demo the expected behavior?
Unfortunately, I was unable to launch it due to a build issue.
2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.Another column added.
now it looks like:SELECT * FROM save_error_csv_error;
filename | lineno | line
| field | source | err_message |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
STDIN | 1 | 2002 232 40 50 60 70
80 | NULL | NULL | extra data after last expected column |
NULL | 22P04
STDIN | 1 | 2000 230 23
| d | NULL | missing data for column "d" | NULL
| 22P04
STDIN | 1 | z,,""
| a | z | invalid input syntax for type integer: "z" | NULL
| 22P02
STDIN | 2 | \0,,
| a | \0 | invalid input syntax for type integer: "\0" | NULL
| 22P02
Yes, I see the "filename" column, and this will solve the problem, but
"STDIN" is unclear to me.
3. I found spelling:
/* no err_nsp.error_rel table then crete one. for holding error. */
fixed.
4. Maybe rewrite this comment
these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.fixed.
Thank you.
5. Is this part of the comment needed? I think it duplicates the information below when we form the query.
* . column list(order by attnum, begin from ctid) =
* {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
* . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.
Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.
It looks clearer and better, thanks!
Comments in the format of questions are unusual for me, I perceive them
to think about it, for example, as here (contrib/bloom/blinsert.c:312):
/*
* Didn't find place to insert in notFullPage array. Allocate new page.
* (XXX is it good to do this while holding ex-lock on the metapage??)
*/
Maybe we can rewrite it like this:
/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
Thank you for your work. Unfortunately, your code contained errors during the make installation:
'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2I have ubuntu 22.04 operation system.
On 06.12.2023 13:47, jian he wrote:
On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
Hi!
Thank you for your contribution to this thread.
I reviewed it and have a few questions.
1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.
However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be able to face such a problem?Can you demo the expected behavior?
Unfortunately, I was unable to launch it due to a build issue.
Hopefully attached will work.
2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.Another column added.
now it looks like:SELECT * FROM save_error_csv_error;
filename | lineno | line
| field | source | err_message |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
STDIN | 1 | 2002 232 40 50 60 70
80 | NULL | NULL | extra data after last expected column |
NULL | 22P04
STDIN | 1 | 2000 230 23
| d | NULL | missing data for column "d" | NULL
| 22P04
STDIN | 1 | z,,""
| a | z | invalid input syntax for type integer: "z" | NULL
| 22P02
STDIN | 2 | \0,,
| a | \0 | invalid input syntax for type integer: "\0" | NULL
| 22P02Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.
please see comment in struct CopyFromStateData:
char *filename; /* filename, or NULL for STDIN */
*/
Maybe we can rewrite it like this:
/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.
refactored.
Attachments:
v10-0001-Make-COPY-FROM-more-error-tolerant.patchapplication/x-patch; name=v10-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 2510dc2e2b13c60a5a7e184bf8e55325601d97e0 Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Sun, 10 Dec 2023 09:51:42 +0800
Subject: [PATCH v10 1/1] Make COPY FROM more error tolerant
Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.
Only works for COPY FROM, non-BINARY mode.
While copying, if error never happened, error save table will be dropped at the ending of COPY FROM.
If the error saving table already exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
we save the error to error saving table using SPI, construct a query, then execute the query.
---
contrib/file_fdw/file_fdw.c | 4 +-
doc/src/sgml/ref/copy.sgml | 93 +++++++++++++
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 146 +++++++++++++++++++-
src/backend/commands/copyfromparse.c | 169 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 7 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 135 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 108 +++++++++++++++
12 files changed, 670 insertions(+), 19 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
+ slot->tts_values, slot->tts_isnull, NULL);
if (found)
ExecStoreVirtualTuple(slot);
@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
MemoryContextReset(tupcontext);
MemoryContextSwitchTo(tupcontext);
- found = NextCopyFrom(cstate, NULL, values, nulls);
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
MemoryContextSwitchTo(oldcontext);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..a6370c42 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,17 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion failure while copying will automatically report error information to a regular table.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -572,6 +584,12 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is spceified and a conversion error occurs while copying, then
+ <productname>PostgreSQL</productname> will create a table to save all the conversion errors. Conversion error
+ include data type conversion failure, extra data or missing data in the source file.
+ </para>
+
</refsect1>
<refsect1>
@@ -962,6 +980,81 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title>Error Save Table </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> spceicfied, all the data type conversion fail while copying will automatically saved in a regular table.
+ <xref linkend="copy-errorsave-table"/> shows the error save table name, data type, and description.
+ </para>
+
+ <table id="copy-errorsave-table">
+
+ <title>COPY ERROR SAVE TABLE </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the input file</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of error occuring line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>field</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field name of the error occuring</entry>
+ </row>
+
+ <row>
+ <entry> <literal>source</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occuring field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code for the copying error</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..90a22431 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -38,6 +38,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -652,10 +653,12 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ StringInfo err_save_buf;
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -952,6 +955,7 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ err_save_buf = makeStringInfo();
for (;;)
{
TupleTableSlot *myslot;
@@ -989,8 +993,54 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
/* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull, err_save_buf))
+ {
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->error_nsp && cstate->error_rel);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%ld rows were skipped because of error."
+ " skipped row saved to table %s.%s",
+ cstate->error_rows_cnt,
+ cstate->error_nsp, cstate->error_rel));
+ }
+ else
+ {
+ StringInfoData querybuf;
+ if (cstate->error_firsttime)
+ {
+ ereport(NOTICE,
+ errmsg("No error happened."
+ "Error holding table %s.%s will be droped",
+ cstate->error_nsp, cstate->error_rel));
+ initStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "DROP TABLE IF EXISTS %s.%s CASCADE ",
+ cstate->error_nsp, cstate->error_rel);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+ }
+ else
+ ereport(NOTICE,
+ errmsg("No error happened. "
+ "All the past error holding saved at %s.%s ",
+ cstate->error_nsp, cstate->error_rel));
+ }
+ }
break;
+ }
+
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
ExecStoreVirtualTuple(myslot);
@@ -1444,6 +1494,98 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ char *err_nsp;
+ char error_rel[NAMEDATALEN];
+ StringInfoData querybuf;
+ bool isnull;
+ bool error_table_ok;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ snprintf(error_rel, sizeof(error_rel), "%s",
+ RelationGetRelationName(cstate->rel));
+ strlcat(error_rel,"_error", NAMEDATALEN);
+ err_nsp = get_namespace_name(RelationGetNamespace(cstate->rel));
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the err_nsp.error_rel table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,filename,lineno,line,field,source,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+
+ appendStringInfo(&querybuf,
+ "relname = $$%s$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ error_rel, err_nsp);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ error_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ /* No err_nsp.error_rel table then create it for holding error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.%s (FILENAME TEXT, LINENO BIGINT, LINE TEXT, "
+ "FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, "
+ "ERR_DETAIL TEXT, ERRORCODE TEXT)",
+ err_nsp,error_rel);
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ cstate->error_firsttime = true;
+ }
+ else if (error_table_ok)
+ /* error save table already exists. Set error_firsttime to false */
+ cstate->error_firsttime = false;
+ else if(!error_table_ok)
+ ereport(ERROR,
+ (errmsg("Error save table %s.%s already exists. "
+ "Cannot use it for COPY FROM error saving",
+ err_nsp, error_rel)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* thses information is necessary, no error then drop err_sp.error_rel table*/
+ cstate->error_rel = pstrdup(error_rel);
+ cstate->error_nsp = err_nsp;
+ }
+ else
+ {
+ /* set to NULL */
+ cstate->error_rel = NULL;
+ cstate->error_nsp = NULL;
+ cstate->escontext = NULL;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..e7b7a816 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -66,10 +66,12 @@
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -852,7 +854,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
*/
bool
NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+ Datum *values, bool *nulls, StringInfo err_save_buf)
{
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
@@ -885,11 +887,48 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ char *errmsg_extra = "extra data after last expected column";
+
+ resetStringInfo(err_save_buf);
+ /* add line buf, etc for line have extra data to error save table*/
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename, lineno,line, "
+ "err_message, errorcode) "
+ "SELECT $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ errmsg_extra,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +940,46 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ char errmsg[128];
+ snprintf(errmsg, sizeof(errmsg),
+ "missing data for column \"%s\"",
+ NameStr(att->attname));
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename,lineno,line, field, "
+ "err_message, errorcode) "
+ "SELECT $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$ ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ NameStr(att->attname), errmsg,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1031,85 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ {
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ }
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char errcode[12];
+ char *err_detail;
+ snprintf(errcode, sizeof(errcode), "%s",
+ unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename, lineno,line,field, "
+ "source, err_message, errorcode,err_detail) "
+ "SELECT $$%s$$, $$%ld$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$, $$%s$$, ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ cstate->cur_lineno, cstate->line_buf.data,
+ cstate->cur_attname, string,
+ cstate->escontext->error_data->message,
+ errcode);
+
+ if (!err_detail)
+ appendStringInfo(err_save_buf, "NULL::text");
+ else
+ appendStringInfo(err_save_buf,"$$%s$$", err_detail);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_execute failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89..61b5c5b1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17329,6 +17333,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -17937,6 +17942,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..de47791a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
@@ -82,7 +83,7 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, StringInfo err_save_buf);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..b1c02b2f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,12 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ int64 error_rows_cnt; /* total number of rows that have errors */
+ const char *error_rel; /* the error row save table name */
+ const char *error_nsp; /* the error row table's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
+ bool error_firsttime; /* first time create error save table */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..1da12b72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,116 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+NOTICE: No error happened.Error holding table public.save_error_csv_error will be droped
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+ expected_zero
+---------------
+ 0
+(1 row)
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: Error save table public.save_error_csv_error already exists. Cannot use it for COPY FROM error saving
+DROP TABLE save_error_csv_error;
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of error. skipped row saved to table public.save_error_csv_error
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | d | b_null | b_empty
+---+---+------+------+--------+---------
+ 2 | | NULL | NULL | f | t
+(1 row)
+
+SELECT * FROM save_error_csv_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
+ STDIN | 1 | 2002 232 40 50 60 70 80 | NULL | NULL | extra data after last expected column | NULL | 22P04
+ STDIN | 1 | 2000 230 23 | d | NULL | missing data for column "d" | NULL | 22P04
+ STDIN | 1 | z,,"" | a | z | invalid input syntax for type integer: "z" | NULL | 22P02
+ STDIN | 2 | \0,, | a | \0 | invalid input syntax for type integer: "\0" | NULL | 22P02
+(4 rows)
+
+DROP TABLE save_error_csv, save_error_csv_error;
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 8 rows were skipped because of error. skipped row saved to table public.check_ign_err_error
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+NOTICE: No error happened. All the past error holding saved at public.check_ign_err_error
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+--------------------------------------------+-------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ STDIN | 2 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | " | |
+ STDIN | 3 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 4 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ STDIN | 5 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 9 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(14 rows)
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY check_ign_err FROM STDIN WITH (save_error, save_error o...
+ ^
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 4 rows were skipped because of error. skipped row saved to table public.textrange_input_error
+SELECT * FROM textrange_input_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------+-------+----------+-------------------------------------------------------------------+------------------------------------------+-----------
+ STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 2 | (",a),(",",a),()",a) | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | c | a) | malformed range literal: "a)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 4 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +932,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of error. skipped row saved to table public.copy_default_error_save_error
+select count(*) as expect_zero from copy_default_error_save;
+ expect_zero
+-------------
+ 0
+(1 row)
+
+select * from copy_default_error_save_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..3f43ce75 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,98 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+\.
+
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+DROP TABLE save_error_csv_error;
+
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+2002 232 40 50 60 70 80
+\.
+
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+2000 230 23
+\.
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+
+SELECT * FROM save_error_csv_error;
+
+DROP TABLE save_error_csv, save_error_csv_error;
+
+
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+,,,
+\.
+
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a)
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT * FROM textrange_input_error;
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
+
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +701,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+select count(*) as expect_zero from copy_default_error_save;
+select * from copy_default_error_save_error;
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
Hi! Thank you for your work. Your patch looks better!
On 10.12.2023 13:32, jian he wrote:
On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina<lena.ribackina@yandex.ru> wrote:
Thank you for your work. Unfortunately, your code contained errors during the make installation:
'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2I have ubuntu 22.04 operation system.
On 06.12.2023 13:47, jian he wrote:
On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina<lena.ribackina@yandex.ru> wrote:
Hi!
Thank you for your contribution to this thread.
I reviewed it and have a few questions.
1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.
However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be able to face such a problem?Can you demo the expected behavior?
Unfortunately, I was unable to launch it due to a build issue.
Hopefully attached will work.
Yes, thank you! It works fine, and I see that the regression tests have
been passed. 🙂
However, when I ran 'copy from with save_error' operation with simple
csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I
created it, I described below):
postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
FOREIGN KEY(x)
REFERENCES test(x));
I did not find a table with saved errors after operation, although I
received a log about it:
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to
table public.test_error
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (x)=(2) already exists.
CONTEXT: COPY test, line 3
postgres=# select * from public.test_error;
ERROR: relation "public.test_error" does not exist
LINE 1: select * from public.test_error;
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to
table public.test1_error
ERROR: insert or update on table "test1" violates foreign key
constraint "fk_x"
DETAIL: Key (x)=(2) is not present in table "test".
postgres=# select * from public.test1_error;
ERROR: relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;
Two lines were written correctly in the csv files, therefore they should
have been added to the tables, but they were not added to the tables
test and test1.
If I leave only the correct rows, everything works fine and the rows are
added to the tables.
in copy_test.csv:
2,0
1,1
in copy_test1.csv:
2,0
2,1
1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE: No error happened.Error holding table public.test1_error will
be droped
COPY 3
Maybe I'm launching it the wrong way. If so, let me know about it.
I also notice interesting behavior if the table was previously created
by the user. When I was creating an error_table before the 'copy from'
operation,
I received a message saying that it is impossible to create a table with
the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation,
since this seems to be normal behavior to me.
postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
ERROR: Error save table public.test_error already exists. Cannot use it
for COPY FROM error saving
2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.Another column added.
now it looks like:SELECT * FROM save_error_csv_error;
filename | lineno | line
| field | source | err_message |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
STDIN | 1 | 2002 232 40 50 60 70
80 | NULL | NULL | extra data after last expected column |
NULL | 22P04
STDIN | 1 | 2000 230 23
| d | NULL | missing data for column "d" | NULL
| 22P04
STDIN | 1 | z,,""
| a | z | invalid input syntax for type integer: "z" | NULL
| 22P02
STDIN | 2 | \0,,
| a | \0 | invalid input syntax for type integer: "\0" | NULL
| 22P02Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.
please see comment in struct CopyFromStateData:
char *filename; /* filename, or NULL for STDIN */
Yes, I can see that.
I haven't figured out how to fix it yet either.
*/
Maybe we can rewrite it like this:
/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.refactored.
Fine)
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:
Hi! Thank you for your work. Your patch looks better!
Yes, thank you! It works fine, and I see that the regression tests have been passed. 🙂
However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below):postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
FOREIGN KEY(x)
REFERENCES test(x));I did not find a table with saved errors after operation, although I received a log about it:
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (x)=(2) already exists.
CONTEXT: COPY test, line 3postgres=# select * from public.test_error;
ERROR: relation "public.test_error" does not exist
LINE 1: select * from public.test_error;postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error
ERROR: insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL: Key (x)=(2) is not present in table "test".postgres=# select * from public.test1_error;
ERROR: relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1.
If I leave only the correct rows, everything works fine and the rows are added to the tables.
in copy_test.csv:
2,0
1,1
in copy_test1.csv:
2,0
2,1
1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: No error happened.Error holding table public.test1_error will be droped
COPY 3Maybe I'm launching it the wrong way. If so, let me know about it.
looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.
Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?
but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.
I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this seems to be normal behavior to me.
doc changed. you may check it.
Attachments:
v11-0001-Make-COPY-FROM-more-error-tolerant.patchtext/x-patch; charset=US-ASCII; name=v11-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 3024bf3b727b728c58dfef41c62d7a93c083b887 Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Tue, 12 Dec 2023 20:58:45 +0800
Subject: [PATCH v11 1/1] Make COPY FROM more error tolerant
Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error specifier will
save errors to a error saving table automatically.
We check the error saving table definition by column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.
Only works for COPY FROM, non-BINARY mode.
While copying, if error never happened, error saving table will be dropped at the ending of COPY FROM.
If the error saving table exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
We save the error related meta info to error saving table using SPI,
that is construct a query string, then execute the query.
---
contrib/file_fdw/file_fdw.c | 4 +-
doc/src/sgml/ref/copy.sgml | 100 +++++++++++++-
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 146 +++++++++++++++++++-
src/backend/commands/copyfromparse.c | 169 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 7 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 135 ++++++++++++++++++
src/test/regress/sql/copy2.sql | 108 +++++++++++++++
12 files changed, 676 insertions(+), 20 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
+ slot->tts_values, slot->tts_isnull, NULL);
if (found)
ExecStoreVirtualTuple(slot);
@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
MemoryContextReset(tupcontext);
MemoryContextSwitchTo(tupcontext);
- found = NextCopyFrom(cstate, NULL, values, nulls);
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
MemoryContextSwitchTo(oldcontext);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..fb303b4f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,18 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion errors while copying will automatically saved in an Error Saving table and the <command>COPY FROM</command> operation will not be interrupted by conversion errors.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ If this option is omitted, any data type conversion errors will be raised immediately.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -564,6 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
<command>VACUUM</command> to recover the wasted space.
+ To continue copying while skip conversion errors in a <command>COPY FROM</command>, you might wish to specify <literal>SAVE_ERROR</literal>.
</para>
<para>
@@ -572,6 +586,16 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is specified and conversion errors occurred while copying, then
+ <productname>PostgreSQL</productname> will first try to create a regular Error Saving table to save all the conversion errors related information.
+ The Error Saving table naming rule is the existing table name concatenated with <literal>_error</literal>.
+ If <productname>PostgreSQL</productname> cannot create the Error Saving table, <command>COPY FROM</command> operation stops, an error is raised.
+ All the future errors while copying to the same table will automatically saved to the same Error Saving table.
+ Conversion errors includes data type conversion failure, extra data or missing data in the source file.
+ Error Saving table detailed description listed in <xref linkend="copy-errorsave-table"/>.
+ </para>
+
</refsect1>
<refsect1>
@@ -588,7 +612,7 @@ COPY <replaceable class="parameter">count</replaceable>
output function, or acceptable to the input function, of each
attribute's data type. The specified null string is used in
place of columns that are null.
- <command>COPY FROM</command> will raise an error if any line of the
+ By default, if <literal>SAVE_ERROR</literal> not specified, <command>COPY FROM</command> will raise an error if any line of the
input file contains more or fewer columns than are expected.
</para>
@@ -962,6 +986,80 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title>Error Save Table </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> specified, all the data type conversion errors while copying will automatically saved in an Error Saving table.
+ <xref linkend="copy-errorsave-table"/> shows the Error Saving table's column name, data type, and description.
+ </para>
+
+ <table id="copy-errorsave-table">
+ <title>Error Saving table description </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the input file</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where the error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>field</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field name of the error occurred</entry>
+ </row>
+
+ <row>
+ <entry> <literal>source</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code for the copying error</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..236d711b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -38,6 +38,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -652,10 +653,12 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ StringInfo err_save_buf;
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -952,6 +955,7 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ err_save_buf = makeStringInfo();
for (;;)
{
TupleTableSlot *myslot;
@@ -989,9 +993,13 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
/* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull, err_save_buf))
break;
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1297,6 +1305,48 @@ CopyFrom(CopyFromState cstate)
ExecResetTupleTable(estate->es_tupleTable, false);
+ /* drop the error saving table or raise a notice */
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->error_nsp && cstate->error_rel);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%llu rows were skipped because of conversion error."
+ " Skipped rows saved to table %s.%s",
+ (unsigned long long) cstate->error_rows_cnt,
+ cstate->error_nsp, cstate->error_rel));
+ }
+ else
+ {
+ StringInfoData querybuf;
+ if (cstate->error_firsttime)
+ {
+ ereport(NOTICE,
+ errmsg("No conversion error happened. "
+ "Error Saving table %s.%s will be dropped",
+ cstate->error_nsp, cstate->error_rel));
+ initStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "DROP TABLE IF EXISTS %s.%s CASCADE ",
+ cstate->error_nsp, cstate->error_rel);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+ }
+ else
+ ereport(NOTICE,
+ errmsg("No error happened. "
+ "All previouly encountered conversion errors saved at %s.%s",
+ cstate->error_nsp, cstate->error_rel));
+ }
+ }
+
/* Allow the FDW to shut down */
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
@@ -1444,6 +1494,98 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ char *err_nsp;
+ char error_rel[NAMEDATALEN];
+ StringInfoData querybuf;
+ bool isnull;
+ bool error_table_ok;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ snprintf(error_rel, sizeof(error_rel), "%s",
+ RelationGetRelationName(cstate->rel));
+ strlcat(error_rel,"_error", NAMEDATALEN);
+ err_nsp = get_namespace_name(RelationGetNamespace(cstate->rel));
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the err_nsp.error_rel table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,filename,lineno,line,field,source,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+
+ appendStringInfo(&querybuf,
+ "relname = $$%s$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ error_rel, err_nsp);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ error_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ /* No err_nsp.error_rel table then create it for holding error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.%s (FILENAME TEXT, LINENO BIGINT, LINE TEXT, "
+ "FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, "
+ "ERR_DETAIL TEXT, ERRORCODE TEXT)",
+ err_nsp,error_rel);
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ cstate->error_firsttime = true;
+ }
+ else if (error_table_ok)
+ /* error save table already exists. Set error_firsttime to false */
+ cstate->error_firsttime = false;
+ else if(!error_table_ok)
+ ereport(ERROR,
+ (errmsg("Error save table %s.%s already exists. "
+ "Cannot use it for COPY FROM error saving",
+ err_nsp, error_rel)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* thses information is necessary, no error then drop err_sp.error_rel table*/
+ cstate->error_rel = pstrdup(error_rel);
+ cstate->error_nsp = err_nsp;
+ }
+ else
+ {
+ /* set to NULL */
+ cstate->error_rel = NULL;
+ cstate->error_nsp = NULL;
+ cstate->escontext = NULL;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..aa168d3f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -66,10 +66,12 @@
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -852,7 +854,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
*/
bool
NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+ Datum *values, bool *nulls, StringInfo err_save_buf)
{
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
@@ -885,11 +887,48 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ char *errmsg_extra = "extra data after last expected column";
+
+ resetStringInfo(err_save_buf);
+ /* add line buf, etc for line have extra data to error save table*/
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename, lineno,line, "
+ "err_message, errorcode) "
+ "SELECT $$%s$$, $$%llu$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data, errmsg_extra,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +940,46 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ char errmsg[128];
+ snprintf(errmsg, sizeof(errmsg),
+ "missing data for column \"%s\"",
+ NameStr(att->attname));
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename,lineno,line, field, "
+ "err_message, errorcode) "
+ "SELECT $$%s$$, $$%llu$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$ ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data, NameStr(att->attname), errmsg,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1031,85 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ {
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ }
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char errcode[12];
+ char *err_detail;
+ snprintf(errcode, sizeof(errcode), "%s",
+ unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.%s(filename, lineno,line,field, "
+ "source, err_message, errorcode,err_detail) "
+ "SELECT $$%s$$, $$%llu$$::bigint, $$%s$$, $$%s$$, "
+ "$$%s$$, $$%s$$, $$%s$$, ",
+ cstate->error_nsp, cstate->error_rel,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data, cstate->cur_attname, string,
+ cstate->escontext->error_data->message,
+ errcode);
+
+ if (!err_detail)
+ appendStringInfo(err_save_buf, "NULL::text");
+ else
+ appendStringInfo(err_save_buf,"$$%s$$", err_detail);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_execute failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f16bbd3c..3a616ab5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17346,6 +17350,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -17954,6 +17959,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..de47791a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
@@ -82,7 +83,7 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, StringInfo err_save_buf);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..dd41fcaa 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,12 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 error_rows_cnt; /* total number of rows that have errors */
+ const char *error_rel; /* the error row save table name */
+ const char *error_nsp; /* the error row table's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
+ bool error_firsttime; /* first time create error save table */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..aa1398d7 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,116 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+NOTICE: No conversion error happened. Error Saving table public.save_error_csv_error will be dropped
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+ expected_zero
+---------------
+ 0
+(1 row)
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: Error save table public.save_error_csv_error already exists. Cannot use it for COPY FROM error saving
+DROP TABLE save_error_csv_error;
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of conversion error. Skipped rows saved to table public.save_error_csv_error
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+NOTICE: 1 rows were skipped because of conversion error. Skipped rows saved to table public.save_error_csv_error
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table public.save_error_csv_error
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | d | b_null | b_empty
+---+---+------+------+--------+---------
+ 2 | | NULL | NULL | f | t
+(1 row)
+
+SELECT * FROM save_error_csv_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
+ STDIN | 1 | 2002 232 40 50 60 70 80 | NULL | NULL | extra data after last expected column | NULL | 22P04
+ STDIN | 1 | 2000 230 23 | d | NULL | missing data for column "d" | NULL | 22P04
+ STDIN | 1 | z,,"" | a | z | invalid input syntax for type integer: "z" | NULL | 22P02
+ STDIN | 2 | \0,, | a | \0 | invalid input syntax for type integer: "\0" | NULL | 22P02
+(4 rows)
+
+DROP TABLE save_error_csv, save_error_csv_error;
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 8 rows were skipped because of conversion error. Skipped rows saved to table public.check_ign_err_error
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+NOTICE: No error happened. All previouly encountered conversion errors saved at public.check_ign_err_error
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+--------------------------------------------+-------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ STDIN | 2 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | " | |
+ STDIN | 3 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 4 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ STDIN | 5 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ STDIN | 6 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ STDIN | 7 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ STDIN | 8 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ STDIN | 9 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ STDIN | 9 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(14 rows)
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY check_ign_err FROM STDIN WITH (save_error, save_error o...
+ ^
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 4 rows were skipped because of conversion error. Skipped rows saved to table public.textrange_input_error
+SELECT * FROM textrange_input_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------+-------+----------+-------------------------------------------------------------------+------------------------------------------+-----------
+ STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 2 | (",a),(",",a),()",a) | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 2 | (",a),(",",a),()",a) | c | a) | malformed range literal: "a)" | Missing left parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ STDIN | 3 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ STDIN | 4 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ STDIN | 4 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +932,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of conversion error. Skipped rows saved to table public.copy_default_error_save_error
+select count(*) as expect_zero from copy_default_error_save;
+ expect_zero
+-------------
+ 0
+(1 row)
+
+select * from copy_default_error_save_error;
+ filename | lineno | line | field | source | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..3f43ce75 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,98 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+
+--- copy success, error save table will be dropped automatically.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+\.
+
+--error TABLE should already droppped.
+select count(*) as expected_zero from pg_class where relname = 'save_error_csv_error';
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+create table save_error_csv_error();
+--should fail. since table save_error_csv_error already exists.
+--error save table naming logic = copy destination tablename + "_error"
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+DROP TABLE save_error_csv_error;
+
+-- save error with extra data
+COPY save_error_csv from stdin(save_error);
+2002 232 40 50 60 70 80
+\.
+
+-- save error with missing data for column
+COPY save_error_csv from stdin(save_error);
+2000 230 23
+\.
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+
+SELECT * FROM save_error_csv_error;
+
+DROP TABLE save_error_csv, save_error_csv_error;
+
+
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+--special case. will work,but the error TABLE should not DROP.
+COPY check_ign_err FROM STDIN WITH (save_error, format csv, FORCE_NULL *);
+,,,
+\.
+
+--expect error TABLE exists
+SELECT * FROM check_ign_err_error;
+
+-- redundant options not allowed.
+COPY check_ign_err FROM STDIN WITH (save_error, save_error off);
+
+DROP TABLE check_ign_err CASCADE;
+DROP TABLE IF EXISTS check_ign_err_error CASCADE;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+CREATE TABLE textrange_input(a textrange, b textrange, c textrange);
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a)
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT * FROM textrange_input_error;
+DROP TABLE textrange_input;
+DROP TABLE textrange_input_error;
+
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +701,19 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+select count(*) as expect_zero from copy_default_error_save;
+select * from copy_default_error_save_error;
+drop table copy_default_error_save_error,copy_default_error_save;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
On 12.12.2023 16:04, jian he wrote:
On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:Hi! Thank you for your work. Your patch looks better!
Yes, thank you! It works fine, and I see that the regression tests have been passed. 🙂
However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below):postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
FOREIGN KEY(x)
REFERENCES test(x));I did not find a table with saved errors after operation, although I received a log about it:
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (x)=(2) already exists.
CONTEXT: COPY test, line 3postgres=# select * from public.test_error;
ERROR: relation "public.test_error" does not exist
LINE 1: select * from public.test_error;postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error
ERROR: insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL: Key (x)=(2) is not present in table "test".postgres=# select * from public.test1_error;
ERROR: relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1.
If I leave only the correct rows, everything works fine and the rows are added to the tables.
in copy_test.csv:
2,0
1,1
in copy_test1.csv:
2,0
2,1
1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: No error happened.Error holding table public.test1_error will be droped
COPY 3Maybe I'm launching it the wrong way. If so, let me know about it.
looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?
No, I think it will be too much.
but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.
Yes, I see it and agree with you.
I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this seems to be normal behavior to me.doc changed. you may check it.
Yes, I saw it. Thank you.
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Hi,
On Tue, Dec 12, 2023 at 10:04 PM jian he <jian.universality@gmail.com> wrote:
On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:Hi! Thank you for your work. Your patch looks better!
Yes, thank you! It works fine, and I see that the regression tests have been passed. 🙂
However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below):postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
FOREIGN KEY(x)
REFERENCES test(x));I did not find a table with saved errors after operation, although I received a log about it:
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (x)=(2) already exists.
CONTEXT: COPY test, line 3postgres=# select * from public.test_error;
ERROR: relation "public.test_error" does not exist
LINE 1: select * from public.test_error;postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error
ERROR: insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL: Key (x)=(2) is not present in table "test".postgres=# select * from public.test1_error;
ERROR: relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1.
If I leave only the correct rows, everything works fine and the rows are added to the tables.
in copy_test.csv:
2,0
1,1
in copy_test1.csv:
2,0
2,1
1,1
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error
NOTICE: No error happened.Error holding table public.test1_error will be droped
COPY 3Maybe I'm launching it the wrong way. If so, let me know about it.
looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this seems to be normal behavior to me.doc changed. you may check it.
I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.
While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.
Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1]/messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it. So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.
Regards,
[1]: /messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi,
I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.
save the error metadata to system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.
Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it. So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Regards,
[1] /messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
I don't think "specify the maximum number of errors to tolerate
before raising an ERROR." is very useful....
QUOTE from [1]https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html
MAXERROR [AS] error_count
If the load returns the error_count number of errors or greater, the
load fails. If the load returns fewer errors, it continues and returns
an INFO message that states the number of rows that could not be
loaded. Use this parameter to allow loads to continue when certain
rows fail to load into the table because of formatting errors or other
inconsistencies in the data.
Set this value to 0 or 1 if you want the load to fail as soon as the
first error occurs. The AS keyword is optional. The MAXERROR default
value is 0 and the limit is 100000.
The actual number of errors reported might be greater than the
specified MAXERROR because of the parallel nature of Amazon Redshift.
If any node in the Amazon Redshift cluster detects that MAXERROR has
been exceeded, each node reports all of the errors it has encountered.
END OF QUOTE
option MAXERROR error_count. iiuc, it fails while validating line
error_count + 1, else it raises a notice, tells you how many rows have
errors.
* case when error_count is small, and the copy fails, it only tells
you that at least the error_count line has malformed data. but what if
the actual malformed rows are very big. In this case, this failure
error message is not that helpful.
* case when error_count is very big, and the copy does not fail. then
the actual malformed data rows are very big (still less than
error_count). but there is no error report, you don't know which line
has an error.
Either way, if the file has a large portion of malformed rows, then
the MAXERROR option does not make sense.
so maybe we don't need a threshold for tolerating errors.
however, we can have an option, not actually copy to the table, but
only validate, similar to NOLOAD in [1]https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html
why we save the error:
* if only a small portion of malformed rows then saving the error
metadata would be cheap.
* if a large portion of malformed rows then copy will be slow but we
saved the error metadata. Now you can fix it based on this error
metadata.
I think saving errors to a regular table or text file seems sane, but
not to a catalog table.
* for a text file with M rows, N fields, contrived corner case would
be (M-2) * N errors, the last 2 rows have the duplicate keys, violate
primary key constraint. In this case, we first insert (M-2) * N rows
to the catalog table then because of errors we undo it.
I think it will be expensive.
* error meta info is not as important as other pg_catalog tables.
log format is quite verbose, save_error to log seems not so good, I guess.
I suppose we can specify an ERRORFILE directory. similar
implementation [2]https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16, demo in [3]https://www.sqlshack.com/working-with-line-numbers-and-errors-using-bulk-insert/
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.
Let's assume we save the error info to a table:
Since the previous thread says one copy operation may create one error
table is not a good idea, looking back, I agree.
Similar to [4]https://docs.aws.amazon.com/redshift/latest/dg/r_STL_LOAD_ERRORS.html
I come with the following logic/ideas:
* save_error table name be COPY_ERRORS, shema be the same as copy from
destination table.
* one COPY_ERRORS table saves all COPY FROM generated error metadata
* if save_error specified, before do COPY FROM, first check if the
table COPY_ERRORS
exists,
if not then create one? Or raise an error saying that COPY_ERRORS does
not exist, cannot save_error?
* COPY_ERRORS table owner be current database owner?
* Only the table owner is allowed to INSERT/DELETE/UPDATE, others are
not allowed to INSERT/DELETE/UPDATE.
while doing copy error happened, record the userid, then switch
COPY_ERRORS owner execute the insert command
* the user who is doing COPY FROM operation is allowed solely to view
(select) the errored row they generated.
COPY_ERRORS table would be:
userid oid /* the user who is doing this operation */
error_time timestamptz /* when this error
happened. not 100% sure this column is needed */
filename text /* the copy from source */
table_name text /* the copy from destination */
lineno bigint /* the error line number */
line text /* the whole line raw content */
colname text -- Field with the error.
raw_field_value text --- The value for the field that leads to the error.
err_message text -- same as ErrorData->message
err_detail text --same as ErrorData->detail
errorcode text --transformed errcode, example "22P02"
[1]: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html
[2]: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver16
[3]: https://www.sqlshack.com/working-with-line-numbers-and-errors-using-bulk-insert/
[4]: https://docs.aws.amazon.com/redshift/latest/dg/r_STL_LOAD_ERRORS.html
On 2023-12-15 05:48, Masahiko Sawada wrote:
Thanks for joining this discussion!
I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it.
+1.
I may be biased since I wrote some ~v6 patches which just output the
soft errors and number of skipped rows to log, but I think just (1)
would be worth implementing as you pointed out and I like if users could
choose where to log output.
I think there would be situations where it is preferable to save errors
to server log even considering problems which were pointed out in [1]/messages/by-id/739953.1699467519@sss.pgh.pa.us,
i.e. manually loading data.
[1]: /messages/by-id/739953.1699467519@sss.pgh.pa.us
/messages/by-id/739953.1699467519@sss.pgh.pa.us
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Regards,
[1]
/messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Hi,
save the error metadata to system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.
I think saving error metadata to system catalog is not a good idea, too.
And I believe Sawada-san just pointed out missing features and did not
suggested that we use system catalog.
I don't think "specify the maximum number of errors to tolerate
before raising an ERROR." is very useful....
That may be so.
I imagine it's useful in some use case since some loading tools have
such options.
Anyway I agree it's not necessary for initial patch as mentioned in [1]/messages/by-id/752672.1699474336@sss.pgh.pa.us.
I suppose we can specify an ERRORFILE directory. similar
implementation [2], demo in [3]
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.
That may be a good option when considering "(2) logging errors to
somewhere".
What do you think about the proposal to develop these features in
incrementally?
On 2023-12-15 05:48, Masahiko Sawada wrote:
So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.
[1]: /messages/by-id/752672.1699474336@sss.pgh.pa.us
/messages/by-id/752672.1699474336@sss.pgh.pa.us
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On Mon, Dec 18, 2023 at 1:09 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
save the error metadata to system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.I think saving error metadata to system catalog is not a good idea, too.
And I believe Sawada-san just pointed out missing features and did not
suggested that we use system catalog.I don't think "specify the maximum number of errors to tolerate
before raising an ERROR." is very useful....That may be so.
I imagine it's useful in some use case since some loading tools have
such options.
Anyway I agree it's not necessary for initial patch as mentioned in [1].I suppose we can specify an ERRORFILE directory. similar
implementation [2], demo in [3]
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.That may be a good option when considering "(2) logging errors to
somewhere".What do you think about the proposal to develop these features in
incrementally?
I am more with tom's idea [1]/messages/by-id/900123.1699488001@sss.pgh.pa.us, that is when errors happen (data type
conversion only), do not fail, AND we save the error to a table. I
guess we can implement this logic together, only with a new COPY
option.
imagine a case (it's not that contrived, imho), while conversion from
text to table's int, postgres isspace is different from the source
text file's isspace logic.
then all the lines are malformed. If we just say on error continue and
not save error meta info, the user is still confused which field has
the wrong data, then the user will probably try to incrementally test
which field contains malformed data.
Since we need to save the error somewhere.
Everyone has the privilege to INSERT can do COPY.
I think we also need to handle the access privilege also.
So like I mentioned above, one copy_error error table hub, then
everyone can view/select their own copy failure record.
but save to a server text file/directory, not easy for an INSERT
privilege user to see these files, I think.
similarly not easy to see these failed records in log for limited privilege.
if someone wants to fail at maxerror rows, they can do it, since we
will count how many rows failed.
even though I didn't get it.
On Mon, Dec 18, 2023 at 9:16 AM jian he <jian.universality@gmail.com> wrote:
On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi,
I've read this thread and the latest patch. IIUC with SAVE_ERROR
option, COPY FROM creates an error table for the target table and
writes error information there.While I agree that the final shape of this feature would be something
like that design, I'm concerned some features are missing in order to
make this feature useful in practice. For instance, error logs are
inserted to error tables without bounds, meaning that users who want
to tolerate errors during COPY FROM will have to truncate or drop the
error tables periodically, or the database will grow with error logs
without limit. Ideally such maintenance work should be done by the
database. There might be some users who want to log such conversion
errors in server logs to avoid such maintenance work. I think we
should provide an option for where to write, at least. Also, since the
error tables are normal user tables internally, error logs are also
replicated to subscribers if there is a publication FOR ALL TABLES,
unlike system catalogs. I think some users would not like such
behavior.save the error metadata to system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.
I don't think the new regex idea would be a good solution for the
existing users who are using FOR ALL TABLES publication. It's not
desirable that they have to change the publication because of this
feature. With the current patch, a logical replication using FOR ALL
TABLES publication will stop immediately after an error information is
inserted into a new error table unless the same error table is created
on subscribers.
Looking at SAVE_ERROR feature closely, I think it consists of two
separate features. That is, it enables COPY FROM to load data while
(1) tolerating errors and (2) logging errors to somewhere (i.e., an
error table). If we implement only (1), it would be like COPY FROM
tolerate errors infinitely and log errors to /dev/null. The user
cannot see the error details but I guess it could still help some
cases as Andres mentioned[1] (it might be a good idea to send the
number of rows successfully loaded in a NOTICE message if some rows
could not be loaded). Then with (2), COPY FROM can log error
information to somewhere such as tables and server logs and the user
can select it. So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Regards,
[1] /messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.comfeature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step wouldI don't think "specify the maximum number of errors to tolerate
before raising an ERROR." is very useful....QUOTE from [1]
MAXERROR [AS] error_count
If the load returns the error_count number of errors or greater, the
load fails. If the load returns fewer errors, it continues and returns
an INFO message that states the number of rows that could not be
loaded. Use this parameter to allow loads to continue when certain
rows fail to load into the table because of formatting errors or other
inconsistencies in the data.
Set this value to 0 or 1 if you want the load to fail as soon as the
first error occurs. The AS keyword is optional. The MAXERROR default
value is 0 and the limit is 100000.
The actual number of errors reported might be greater than the
specified MAXERROR because of the parallel nature of Amazon Redshift.
If any node in the Amazon Redshift cluster detects that MAXERROR has
been exceeded, each node reports all of the errors it has encountered.
END OF QUOTEoption MAXERROR error_count. iiuc, it fails while validating line
error_count + 1, else it raises a notice, tells you how many rows have
errors.* case when error_count is small, and the copy fails, it only tells
you that at least the error_count line has malformed data. but what if
the actual malformed rows are very big. In this case, this failure
error message is not that helpful.
* case when error_count is very big, and the copy does not fail. then
the actual malformed data rows are very big (still less than
error_count). but there is no error report, you don't know which line
has an error.Either way, if the file has a large portion of malformed rows, then
the MAXERROR option does not make sense.
so maybe we don't need a threshold for tolerating errors.however, we can have an option, not actually copy to the table, but
only validate, similar to NOLOAD in [1]
I'm fine even if the feature is not like MAXERROR. If we want a
feature to tolerate errors during COPY FROM, I just thought it might
be a good idea to have a tuning knob for better flexibility, not just
like a on/off switch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Mon, Dec 18, 2023 at 4:41 PM jian he <jian.universality@gmail.com> wrote:
On Mon, Dec 18, 2023 at 1:09 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
save the error metadata to system catalogs would be more expensive,
please see below explanation.
I have no knowledge of publications.
but i feel there is a feature request: publication FOR ALL TABLES
exclude regex_pattern.
Anyway, that would be another topic.I think saving error metadata to system catalog is not a good idea, too.
And I believe Sawada-san just pointed out missing features and did not
suggested that we use system catalog.I don't think "specify the maximum number of errors to tolerate
before raising an ERROR." is very useful....That may be so.
I imagine it's useful in some use case since some loading tools have
such options.
Anyway I agree it's not necessary for initial patch as mentioned in [1].I suppose we can specify an ERRORFILE directory. similar
implementation [2], demo in [3]
it will generate 2 files, one file shows the malform line content as
is, another file shows the error info.That may be a good option when considering "(2) logging errors to
somewhere".What do you think about the proposal to develop these features in
incrementally?I am more with tom's idea [1], that is when errors happen (data type
conversion only), do not fail, AND we save the error to a table. I
guess we can implement this logic together, only with a new COPY
option.
If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.
imagine a case (it's not that contrived, imho), while conversion from
text to table's int, postgres isspace is different from the source
text file's isspace logic.
then all the lines are malformed. If we just say on error continue and
not save error meta info, the user is still confused which field has
the wrong data, then the user will probably try to incrementally test
which field contains malformed data.Since we need to save the error somewhere.
Everyone has the privilege to INSERT can do COPY.
I think we also need to handle the access privilege also.
So like I mentioned above, one copy_error error table hub, then
everyone can view/select their own copy failure record.
The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Dec 19, 2023 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Please check my latest attached POC.
Main content is to build spi query, execute the spi query, regress
test and regress output.
copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.
if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.
Since foo.copy_errors is mainly INSERT operations, if copy_errors grow
too much, that means your source file has many errors, it will take a
very long time to finish the whole COPY. maybe we can capture how many
errors encountered in another client.
I don't know how to deal with logic replication. looking for ideas.
Attachments:
v12-0001-Make-COPY-FROM-more-error-tolerant.patchtext/x-patch; charset=US-ASCII; name=v12-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 9affaf6d94eb4afe26fc7181e38e53eed14e0216 Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Wed, 20 Dec 2023 11:26:25 +0800
Subject: [PATCH v12 1/1] Make COPY FROM more error tolerant
Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error specifier will
save errors to table copy_errors for all the copy from operation in the same schema.
We check the existing copy_error table definition by column name and column data type.
if table already exists and meets the criteria then errors will to it
if the table does not exist, then create one.
copy_errors is per schema, it's owned by schema's owner.
for non-superusers, if you can do insert in that schema, then you can insert to copy_errors,
but you are only allowed to select/delet your own rows, which is judged by current_user
with copy_error's userid column. Priviledge restirction is implmented via ROW LEVEL SECURITY.
Only works for COPY FROM, non-BINARY mode.
While copying, if error never happened, error saving table will be dropped at the ending of COPY FROM.
If the error saving table exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
We save the error related meta info to error saving table using SPI,
that is construct a query string, then execute the query.
---
contrib/file_fdw/file_fdw.c | 4 +-
doc/src/sgml/ref/copy.sgml | 122 ++++++++++++++-
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 168 ++++++++++++++++++++-
src/backend/commands/copyfromparse.c | 179 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 3 +-
src/include/commands/copyfrom_internal.h | 5 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 160 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 142 ++++++++++++++++++
12 files changed, 787 insertions(+), 20 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2189be8a..2d3eb34f 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -751,7 +751,7 @@ fileIterateForeignScan(ForeignScanState *node)
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
found = NextCopyFrom(festate->cstate, econtext,
- slot->tts_values, slot->tts_isnull);
+ slot->tts_values, slot->tts_isnull, NULL);
if (found)
ExecStoreVirtualTuple(slot);
@@ -1183,7 +1183,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
MemoryContextReset(tupcontext);
MemoryContextSwitchTo(tupcontext);
- found = NextCopyFrom(cstate, NULL, values, nulls);
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
MemoryContextSwitchTo(oldcontext);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..3dbf70ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,18 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion errors while copying will automatically saved in table <literal>COPY_ERRORS</literal> and the <command>COPY FROM</command> operation will not be interrupted by conversion errors.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ If this option is omitted, any data type conversion errors will be raised immediately.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -564,6 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
<command>VACUUM</command> to recover the wasted space.
+ To continue copying while skip conversion errors in a <command>COPY FROM</command>, you might wish to specify <literal>SAVE_ERROR</literal>.
</para>
<para>
@@ -572,6 +586,19 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+
+ If the <literal>SAVE_ERROR</literal> option is specified and conversion errors occur while copying,
+ <productname>PostgreSQL</productname> will first check the table <literal>COPY_ERRORS</literal> existence, then save the conversion error related information to it.
+ If it does exist, but the actual table definition cannot use it to save the error information, an error is raised, <command>COPY FROM</command> operation stops.
+ If it does not exist, <productname>PostgreSQL</productname> will try to create it before doing the actual copy operation.
+ The table <literal>COPY_ERRORS</literal> owner is the current schema owner.
+ All the future errors related information generated while copying data to the same schema will automatically be saved to the same <literal>COPY_ERRORS</literal> table.
+ Copy conversion error is privileged information, non-superusers is only allowed to <literal>SELECT</literal>, <literal>DELETE</literal> or <literal>INSERT</literal> their own row in the <literal>COPY_ERRORS</literal> table.
+ Conversion errors include data type conversion failure, extra data or missing data in the source file.
+ <literal>COPY_ERRORS</literal> table detailed description listed in <xref linkend="copy-errors-table"/>.
+
+ </para>
</refsect1>
<refsect1>
@@ -588,7 +615,7 @@ COPY <replaceable class="parameter">count</replaceable>
output function, or acceptable to the input function, of each
attribute's data type. The specified null string is used in
place of columns that are null.
- <command>COPY FROM</command> will raise an error if any line of the
+ By default, if <literal>SAVE_ERROR</literal> not specified, <command>COPY FROM</command> will raise an error if any line of the
input file contains more or fewer columns than are expected.
</para>
@@ -962,6 +989,99 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title> TABLE COPY_ERRORS </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> specified, all the data type conversion errors while copying will automatically saved in <literal>COPY_ERRORS</literal>
+ <xref linkend="copy-errors-table"/> shows <literal>COPY_ERRORS</literal> table's column name, data type, and description.
+ </para>
+
+ <table id="copy-errors-table">
+ <title>Error Saving table description </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>userid</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The user generated the conversion error.
+ Refer <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_authid</literal>, if correspond <structfield>oid</structfield> deleted in <literal>pg_authid</literal>, it becomes stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>copy_destination</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The <command>COPY FROM</command> operation destination table oid.
+ Refer <link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_class</literal> if correspond <structfield>oid</structfield> deleted in <literal>pg_class</literal>, it becomes stale.
+
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the input filed</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where the error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>colname</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field where the error occurred</entry>
+ </row>
+
+ <row>
+ <entry> <literal>raw_field_value</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code for the copying error</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..a84080b4 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -29,7 +29,9 @@
#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
+#include "catalog/pg_authid.h"
#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
@@ -38,6 +40,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -52,6 +55,7 @@
#include "utils/portal.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
/*
* No more than this many tuples per CopyMultiInsertBuffer
@@ -652,10 +656,12 @@ CopyFrom(CopyFromState cstate)
bool has_before_insert_row_trig;
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
+ StringInfo err_save_buf;
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -952,6 +958,7 @@ CopyFrom(CopyFromState cstate)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ err_save_buf = makeStringInfo();
for (;;)
{
TupleTableSlot *myslot;
@@ -989,9 +996,13 @@ CopyFrom(CopyFromState cstate)
ExecClearTuple(myslot);
/* Directly store the values/nulls array in the slot */
- if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+ if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull, err_save_buf))
break;
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1297,6 +1308,20 @@ CopyFrom(CopyFromState cstate)
ExecResetTupleTable(estate->es_tupleTable, false);
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->copy_errors_nspname);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%llu rows were skipped because of conversion error."
+ " Skipped rows saved to table %s.copy_errors",
+ (unsigned long long) cstate->error_rows_cnt,
+ cstate->copy_errors_nspname));
+ }
+ }
+
/* Allow the FDW to shut down */
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
@@ -1444,6 +1469,145 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ StringInfoData querybuf;
+ bool isnull;
+ bool copy_erros_table_ok;
+ Oid nsp_oid;
+ Oid save_userid;
+ Oid ownerId;
+ int save_sec_context;
+ const char *copy_errors_nspname;
+ HeapTuple utup;
+ HeapTuple tuple;
+ const char *rname;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ copy_errors_nspname = get_namespace_name(RelationGetNamespace(cstate->rel));
+ nsp_oid = get_namespace_oid(copy_errors_nspname, false);
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the nsp_oid.COPY_ERRORS table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,userid,copy_destination,filename,lineno, "
+ "line,colname,raw_field_value,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,oid,oid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+ appendStringInfo(&querybuf,
+ "relname = $$copy_errors$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ copy_errors_nspname);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ copy_erros_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+ /*
+ * Switch to the schema owner's userid, so that the COPY_ERRORS table owned by
+ * that user. Also record the current userid.
+ */
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+
+ utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(save_userid));
+ if (!HeapTupleIsValid(utup))
+ elog(ERROR, "cache lookup failed for role %u", save_userid);
+
+ rname = pstrdup(NameStr(((Form_pg_authid) GETSTRUCT(utup))->rolname));
+ ReleaseSysCache(utup);
+
+ tuple = SearchSysCache1(NAMESPACEOID, ObjectIdGetDatum(nsp_oid));
+ if (!HeapTupleIsValid(utup))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_SCHEMA),
+ errmsg("schema with OID %u does not exist", nsp_oid)));
+ ownerId = ((Form_pg_namespace) GETSTRUCT(tuple))->nspowner;
+ ReleaseSysCache(tuple);
+
+ /* not sure the flag is correct */
+ SetUserIdAndSecContext(ownerId,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /* No copy_errors_nspname.COPY_ERRORS table then create it for holding all the potential error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.COPY_ERRORS( "
+ "USERID OID, COPY_DESTINATION OID, FILENAME TEXT,LINENO BIGINT "
+ ",LINE TEXT, COLNAME text, RAW_FIELD_VALUE TEXT "
+ ",ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT)", copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE POLICY copyerror ON %s.COPY_ERRORS "
+ "FOR ALL TO PUBLIC USING (USERID = current_user::regrole::oid)",
+ copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "ALTER TABLE %s.COPY_ERRORS ENABLE ROW LEVEL SECURITY", copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ }
+ else if(!copy_erros_table_ok)
+ ereport(ERROR,
+ (errmsg("table %s.COPY_ERRORS already exists. "
+ "cannot use it for COPY FROM error saving",
+ copy_errors_nspname)));
+
+ /* grant INSERT/SELECT on copy_errors to the copy operation user now */
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "GRANT SELECT, DELETE, INSERT ON TABLE %s.COPY_ERRORS TO %s", copy_errors_nspname, rname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->copy_errors_nspname = pstrdup(copy_errors_nspname);
+ }
+ else
+ {
+ cstate->copy_errors_nspname = NULL;
+ cstate->escontext = NULL;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..f1a6f9dc 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -66,10 +66,12 @@
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -852,7 +854,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
*/
bool
NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+ Datum *values, bool *nulls, StringInfo err_save_buf)
{
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
@@ -880,16 +882,60 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ char *errmsg_extra;
+ Oid save_userid;
+ int save_sec_context;
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
+ /* we need to get the current userid for the SPI queries */
+ if (cstate->opts.save_error)
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ errmsg_extra = pstrdup("extra data after last expected column");
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.copy_errors(userid, copy_destination, filename,lineno,line, "
+ "err_message, errorcode) "
+ "SELECT %u, %u,$$%s$$, %llu,$$%s$$, $$%s$$, $$%s$$",
+ cstate->copy_errors_nspname,
+ save_userid,
+ cstate->rel->rd_rel->oid,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data,
+ errmsg_extra,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +947,50 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ char errmsg[128];
+ snprintf(errmsg, sizeof(errmsg),
+ "missing data for column \"%s\"",
+ NameStr(att->attname));
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.copy_errors( "
+ "userid,copy_destination,filename, "
+ "lineno,line,COLNAME, err_message, errorcode) "
+ "SELECT %u, %u, $$%s$$, %llu, $$%s$$, $$%s$$, $$%s$$, $$%s$$ ",
+ cstate->copy_errors_nspname,
+ save_userid,
+ cstate->rel->rd_rel->oid,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data,
+ NameStr(att->attname),
+ errmsg,
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_exec failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1042,84 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char *err_detail;
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ resetStringInfo(err_save_buf);
+ appendStringInfo(err_save_buf,
+ "INSERT INTO %s.copy_errors(userid,copy_destination, "
+ "filename, lineno,line,COLNAME, "
+ "raw_field_value, err_message,errorcode, err_detail) "
+ "SELECT %u, %u, $$%s$$, %llu, $$%s$$, $$%s$$, $$%s$$, $$%s$$, $$%s$$, ",
+ cstate->copy_errors_nspname,
+ save_userid,
+ cstate->rel->rd_rel->oid,
+ cstate->filename ? cstate->filename : "STDIN",
+ (unsigned long long) cstate->cur_lineno,
+ cstate->line_buf.data,
+ cstate->cur_attname,
+ string,
+ cstate->escontext->error_data->message,
+ unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+
+ if (!err_detail)
+ appendStringInfo(err_save_buf, "NULL::text");
+ else
+ appendStringInfo(err_save_buf,"$$%s$$", err_detail);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(err_save_buf->data, false, 0) != SPI_OK_INSERT)
+ elog(ERROR, "SPI_execute failed: %s", err_save_buf->data);
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 63f172e1..f42e72aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17346,6 +17350,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -17954,6 +17959,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..de47791a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
@@ -82,7 +83,7 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
extern void EndCopyFrom(CopyFromState cstate);
extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls);
+ Datum *values, bool *nulls, StringInfo err_save_buf);
extern bool NextCopyFromRawFields(CopyFromState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..65e34e89 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,10 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 error_rows_cnt; /* total number of rows that have errors */
+ const char *copy_errors_nspname; /* the copy_errors's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..f5a84487 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,142 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY save_error_csv FROM STDIN WITH (save_error, save_error ...
+ ^
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: table public.COPY_ERRORS already exists. cannot use it for COPY FROM error saving
+drop table COPY_ERRORS;
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | d | b_null | b_empty
+---+---+------+------+--------+---------
+ 2 | | NULL | NULL | f | t
+(1 row)
+
+DROP TABLE save_error_csv;
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 10 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+ relname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+---------------+----------+--------+--------------------------------------------+---------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ check_ign_err | STDIN | 1 | 1 {1} 1 1 extra | NULL | NULL | extra data after last expected column | NULL | 22P04
+ check_ign_err | STDIN | 2 | 2 | m | NULL | missing data for column "m" | NULL | 22P04
+ check_ign_err | STDIN | 3 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | | " | |
+ check_ign_err | STDIN | 4 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 5 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ check_ign_err | STDIN | 6 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(16 rows)
+
+DROP TABLE check_ign_err;
+DROP TABLE COPY_ERRORS;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER test_copy_errors1;
+CREATE USER test_copy_errors2;
+CREATE USER test_copy_errors3;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION test_copy_errors3;
+SET LOCAL search_path TO copy_errors_test;
+GRANT USAGE on schema copy_errors_test to test_copy_errors1,test_copy_errors2,test_copy_errors3;
+GRANT CREATE on schema copy_errors_test to test_copy_errors3;
+set role test_copy_errors3;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to test_copy_errors1;
+GRANT insert on textrange_input to test_copy_errors2;
+set role test_copy_errors1;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+---each user is only allowed to see their own rows.
+--based on userid is the same as current_user.
+select count(*) as should_be_zero
+from copy_errors_test.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+join pg_roles pr on pr.oid = ce.userid
+where ce.userid != current_user::regrole::oid;
+ should_be_zero
+----------------
+ 0
+(1 row)
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+ relname | rolname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+-----------------+-------------------+----------+--------+-----------------------+---------+-----------------+-------------------------------------------------------------------+------------------------------------------+-----------
+ textrange_input | test_copy_errors1 | STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | c | a); | malformed range literal: "a);" | Missing left parenthesis or bracket. | 22P02
+(5 rows)
+
+set role test_copy_errors2;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SAVEPOINT s1;
+--current user (non-super user) are not allowed to update)
+update copy_errors_test.copy_errors set userid = 0;
+ERROR: permission denied for table copy_errors
+ROLLBACK to s1;
+--current user (non-super user) are allowed to delete all the record they created.
+delete from copy_errors_test.copy_errors;
+set role test_copy_errors1;
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+ relname | rolname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+-----------------+-------------------+----------+--------+-----------------------+---------+-----------------+-------------------------------------------------------------------+------------------------------------------+-----------
+ textrange_input | test_copy_errors1 | STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | test_copy_errors1 | STDIN | 2 | (",a),(",",a),()",a); | c | a); | malformed range literal: "a);" | Missing left parenthesis or bracket. | 22P02
+(5 rows)
+
+set role test_copy_errors3;
+--owner allowed to drop the table.
+drop table copy_errors;
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +958,27 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save';
+ filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..a4ef06d9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,126 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT,
+ d TEXT
+);
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+drop table COPY_ERRORS;
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+DROP TABLE save_error_csv;
+
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1 extra
+2
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+
+DROP TABLE check_ign_err;
+DROP TABLE COPY_ERRORS;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+
+CREATE USER test_copy_errors1;
+CREATE USER test_copy_errors2;
+CREATE USER test_copy_errors3;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION test_copy_errors3;
+SET LOCAL search_path TO copy_errors_test;
+
+GRANT USAGE on schema copy_errors_test to test_copy_errors1,test_copy_errors2,test_copy_errors3;
+GRANT CREATE on schema copy_errors_test to test_copy_errors3;
+set role test_copy_errors3;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to test_copy_errors1;
+GRANT insert on textrange_input to test_copy_errors2;
+
+set role test_copy_errors1;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a);
+\.
+
+---each user is only allowed to see their own rows.
+--based on userid is the same as current_user.
+select count(*) as should_be_zero
+from copy_errors_test.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+join pg_roles pr on pr.oid = ce.userid
+where ce.userid != current_user::regrole::oid;
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+
+set role test_copy_errors2;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SAVEPOINT s1;
+--current user (non-super user) are not allowed to update)
+update copy_errors_test.copy_errors set userid = 0;
+ROLLBACK to s1;
+
+--current user (non-super user) are allowed to delete all the record they created.
+delete from copy_errors_test.copy_errors;
+
+set role test_copy_errors1;
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+
+set role test_copy_errors3;
+--owner allowed to drop the table.
+drop table copy_errors;
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +729,25 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save';
+
+drop table copy_default_error_save;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
On Wed, Dec 20, 2023 at 1:07 PM jian he <jian.universality@gmail.com> wrote:
On Tue, Dec 19, 2023 at 9:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The error table hub idea is still unclear to me. I assume that there
are error tables at least on each database. And an error table can
have error data that happened during COPY FROM, including malformed
lines. Do the error tables grow without bounds and the users have to
delete rows at some point? If so, who can do that? How can we achieve
that the users can see only errored rows they generated? And the issue
with logical replication also needs to be resolved. Anyway, if we go
this direction, we need to discuss the overall design.Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.comPlease check my latest attached POC.
Main content is to build spi query, execute the spi query, regress
test and regress output.
Why do we need to use SPI? I think we can form heap tuples and insert
them to the error table. Creating the error table also doesn't need to
use SPI.
copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.
It seems that the error table is created when the SAVE_ERROR is used
for the first time. It probably blocks concurrent COPY FROM commands
with SAVE_ERROR option to different tables if the error table is not
created yet.
if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.
I don't think it works. If the user is dropped, the user's oid could
be reused for a different user.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Why do we need to use SPI? I think we can form heap tuples and insert
them to the error table. Creating the error table also doesn't need to
use SPI.
Thanks for pointing it out. I figured out how to form heap tuples and
insert them to the error table.
but I don't know how to create the error table without using SPI.
Please pointer it out.
copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.It seems that the error table is created when the SAVE_ERROR is used
for the first time. It probably blocks concurrent COPY FROM commands
with SAVE_ERROR option to different tables if the error table is not
created yet.
I don't know how to solve this problem.... Maybe we can document this.
but it will block the COPY FROM immediately.
if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.I don't think it works. If the user is dropped, the user's oid could
be reused for a different user.
You are right.
so I changed, now the schema owner will be the error table owner.
every error table tuple inserts,
I switch to schema owner, do the insert, then switch back to the
COPY_FROM operation user.
now everyone (except superuser) will need explicit grant to access the
error table.
Attachments:
v13-0001-Make-COPY-FROM-more-error-tolerant.patchtext/x-patch; charset=UTF-8; name=v13-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 8c8c266f1dc809ffa0ec9f4262bdd912ed6b758a Mon Sep 17 00:00:00 2001
From: pgaddict <jian.universality@gmail.com>
Date: Wed, 27 Dec 2023 20:15:24 +0800
Subject: [PATCH v13 1/1] Make COPY FROM more error tolerant
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error specifier will
save errors to table copy_errors for all the copy from operation in the same schema.
We check the existing copy_errors table definition by column name and column data type.
if table already exists and meets the criteria then errors metadata will save to copy_errors.
if the table does not exist, then create one.
table copy_errors is per schema-wise, it's owned by the copy from
operation destination schema's owner.
The table owner has full privilege on copy_errors,
other non-superuser need gain privilege to access it.
Only works for COPY FROM, non-BINARY mode.
---
doc/src/sgml/ref/copy.sgml | 121 ++++++++++++-
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 133 +++++++++++++-
src/backend/commands/copyfromparse.c | 217 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 6 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 137 ++++++++++++++
src/test/regress/sql/copy2.sql | 123 +++++++++++++
11 files changed, 746 insertions(+), 16 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..1d0ff0b6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,18 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion errors while copying will automatically saved in table <literal>COPY_ERRORS</literal> and the <command>COPY FROM</command> operation will not be interrupted by conversion errors.
+ This option is not allowed when using <literal>binary</literal> format. Note that this
+ is only supported in current <command>COPY FROM</command> syntax.
+ If this option is omitted, any data type conversion errors will be raised immediately.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -564,6 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
<command>VACUUM</command> to recover the wasted space.
+ To continue copying while skip conversion errors in a <command>COPY FROM</command>, you might wish to specify <literal>SAVE_ERROR</literal>.
</para>
<para>
@@ -572,6 +586,18 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is specified and conversion errors occur while copying,
+ <productname>PostgreSQL</productname> will first check the table <literal>COPY_ERRORS</literal> existence, then save the conversion error related information to it.
+ If it does exist, but the actual table definition cannot use it to save the error information, an error is raised, <command>COPY FROM</command> operation stops.
+ If it does not exist, <productname>PostgreSQL</productname> will try to create it before doing the actual copy operation.
+ The table <literal>COPY_ERRORS</literal> owner is the current schema owner.
+ All the future errors related information generated while copying data to the same schema will automatically be saved to the same <literal>COPY_ERRORS</literal> table.
+ Currenly only the owner can read and write data to table <literal>COPY_ERRORS</literal>.
+ Conversion errors include data type conversion failure, extra data or missing data in the source file.
+ <literal>COPY_ERRORS</literal> table detailed description listed in <xref linkend="copy-errors-table"/>.
+
+ </para>
</refsect1>
<refsect1>
@@ -588,7 +614,7 @@ COPY <replaceable class="parameter">count</replaceable>
output function, or acceptable to the input function, of each
attribute's data type. The specified null string is used in
place of columns that are null.
- <command>COPY FROM</command> will raise an error if any line of the
+ By default, if <literal>SAVE_ERROR</literal> not specified, <command>COPY FROM</command> will raise an error if any line of the
input file contains more or fewer columns than are expected.
</para>
@@ -962,6 +988,99 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title> TABLE COPY_ERRORS </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> specified, all the data type conversion errors while copying will automatically saved in <literal>COPY_ERRORS</literal>
+ <xref linkend="copy-errors-table"/> shows <literal>COPY_ERRORS</literal> table's column name, data type, and description.
+ </para>
+
+ <table id="copy-errors-table">
+ <title>Error Saving table description </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>userid</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The user generated the conversion error.
+ Refer <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_authid</literal>, if correspond <structfield>oid</structfield> deleted in <literal>pg_authid</literal>, it becomes stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>copy_destination</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The <command>COPY FROM</command> operation destination table oid.
+ Refer <link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_class</literal> if correspond <structfield>oid</structfield> deleted in <literal>pg_class</literal>, it becomes stale.
+
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the input filed</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where the error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>colname</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field where the error occurred</entry>
+ </row>
+
+ <row>
+ <entry> <literal>raw_field_value</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message text </entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code for the copying error</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..a972ad87 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -29,7 +29,9 @@
#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
+#include "catalog/pg_authid.h"
#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
@@ -38,6 +40,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -52,6 +55,7 @@
#include "utils/portal.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
/*
* No more than this many tuples per CopyMultiInsertBuffer
@@ -655,7 +659,8 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +997,10 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1297,6 +1306,20 @@ CopyFrom(CopyFromState cstate)
ExecResetTupleTable(estate->es_tupleTable, false);
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->copy_errors_nspname);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%llu rows were skipped because of conversion error."
+ " Skipped rows saved to table %s.copy_errors",
+ (unsigned long long) cstate->error_rows_cnt,
+ cstate->copy_errors_nspname));
+ }
+ }
+
/* Allow the FDW to shut down */
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
@@ -1444,6 +1467,114 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ StringInfoData querybuf;
+ bool isnull;
+ bool copy_erros_table_ok;
+ Oid nsp_oid;
+ Oid save_userid;
+ Oid ownerId;
+ int save_sec_context;
+ const char *copy_errors_nspname;
+ HeapTuple tuple;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ copy_errors_nspname = get_namespace_name(RelationGetNamespace(cstate->rel));
+ nsp_oid = get_namespace_oid(copy_errors_nspname, false);
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the nsp_oid.COPY_ERRORS table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,userid,copy_destination,filename,lineno, "
+ "line,colname,raw_field_value,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,oid,oid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+ appendStringInfo(&querybuf,
+ "relname = $$copy_errors$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ copy_errors_nspname);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ copy_erros_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ tuple = SearchSysCache1(NAMESPACEOID, ObjectIdGetDatum(nsp_oid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_SCHEMA),
+ errmsg("schema with OID %u does not exist", nsp_oid)));
+ ownerId = ((Form_pg_namespace) GETSTRUCT(tuple))->nspowner;
+ ReleaseSysCache(tuple);
+
+ cstate->copy_errors_owner = ownerId;
+
+ /*
+ * Switch to the schema owner's userid, so that the COPY_ERRORS table owned by
+ * that user.
+ */
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+
+ SetUserIdAndSecContext(ownerId,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /* No copy_errors_nspname.COPY_ERRORS table then create it for holding all the potential error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.COPY_ERRORS( "
+ "USERID OID, COPY_DESTINATION OID, FILENAME TEXT,LINENO BIGINT "
+ ",LINE TEXT, COLNAME text, RAW_FIELD_VALUE TEXT "
+ ",ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT)", copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ }
+ else if(!copy_erros_table_ok)
+ ereport(ERROR,
+ (errmsg("table %s.COPY_ERRORS already exists. "
+ "cannot use it for COPY FROM error saving",
+ copy_errors_nspname)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->copy_errors_nspname = pstrdup(copy_errors_nspname);
+ }
+ else
+ {
+ cstate->copy_errors_nspname = NULL;
+ cstate->escontext = NULL;
+ cstate->copy_errors_owner = (Oid) 0;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..f0849725 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -58,18 +58,21 @@
*/
#include "postgres.h"
+#include "access/heapam.h"
#include <ctype.h>
#include <unistd.h>
#include <sys/stat.h>
-
+#include <catalog/namespace.h>
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -880,16 +883,85 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ char *errmsg_extra;
+ Oid save_userid = InvalidOid;
+ int save_sec_context = -1;
+ HeapTuple copy_errors_tup;
+ Relation copy_errorsrel;
+ TupleDesc copy_errors_tupDesc;
+ Datum t_values[10];
+ bool t_isnull[10];
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ if (cstate->opts.save_error)
+ {
+ /*
+ * Open the copy_errors relation. we also need current userid for the later heap inserts.
+ *
+ */
+ copy_errorsrel = table_open(RelnameGetRelid("copy_errors"), RowExclusiveLock);
+ copy_errors_tupDesc = copy_errorsrel->rd_att;
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+ }
+
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ errmsg_extra = pstrdup("extra data after last expected column");
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(
+ cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(errmsg_extra);
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +973,55 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(
+ psprintf("missing data for column \"%s\"", NameStr(att->attname)));
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1073,91 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char *err_detail;
+ char *err_code;
+ err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = CStringGetTextDatum(cstate->cur_attname);
+ t_isnull[5] = false;
+ t_values[6] = CStringGetTextDatum(string);
+ t_isnull[6] = false;
+ t_values[7] = CStringGetTextDatum(cstate->escontext->error_data->message);
+ t_isnull[7] = false;
+ t_values[8] = err_detail ? CStringGetTextDatum(err_detail) : (Datum) 0;
+ t_isnull[8] = err_detail ? false: true;
+ t_values[9] = CStringGetTextDatum(err_code);
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
+ if (cstate->opts.save_error)
+ table_close(copy_errorsrel, RowExclusiveLock);
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 63f172e1..f42e72aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -755,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3448,6 +3448,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17346,6 +17350,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -17954,6 +17959,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..aa560dbb 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..2c3b7b42 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,11 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ Oid copy_errors_owner; /* the owner of copy_errors table */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 error_rows_cnt; /* total number of rows that have errors */
+ const char *copy_errors_nspname; /* the copy_errors's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 5984dcfa..d0988a4c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -377,6 +377,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..a2c6bf5aa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,118 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY save_error_csv FROM STDIN WITH (save_error, save_error ...
+ ^
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: table public.COPY_ERRORS already exists. cannot use it for COPY FROM error saving
+drop table COPY_ERRORS;
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | b_null | b_empty
+---+---+------+--------+---------
+ 2 | | NULL | f | t
+(1 row)
+
+DROP TABLE save_error_csv;
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 10 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+ relname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+---------------+----------+--------+--------------------------------------------+---------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ check_ign_err | STDIN | 1 | 1 {1} 1 1 extra | NULL | NULL | extra data after last expected column | NULL | 22P04
+ check_ign_err | STDIN | 2 | 2 | NULL | NULL | missing data for column "m" | NULL | 22P04
+ check_ign_err | STDIN | 3 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | | " | |
+ check_ign_err | STDIN | 4 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 5 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ check_ign_err | STDIN | 6 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(16 rows)
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK to s1;
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+ relname | rolname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+-----------------+----------------+----------+--------+----------------------------+---------+-----------------+-------------------------------------------------------------------+------------------------------------------+-----------
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | c | a); | malformed range literal: "a);" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+--owner allowed to drop the table.
+drop table copy_errors;
+--should fail. no priviledge
+select * from public.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +934,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+ filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..a37986df 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,106 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+drop table COPY_ERRORS;
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+DROP TABLE save_error_csv;
+
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1 extra
+2
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a);
+\.
+
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+
+ROLLBACK to s1;
+
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+
+--owner allowed to drop the table.
+drop table copy_errors;
+
+--should fail. no priviledge
+select * from public.copy_errors;
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +709,26 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
On Thu, 28 Dec 2023 at 09:27, jian he <jian.universality@gmail.com> wrote:
On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Why do we need to use SPI? I think we can form heap tuples and insert
them to the error table. Creating the error table also doesn't need to
use SPI.Thanks for pointing it out. I figured out how to form heap tuples and
insert them to the error table.
but I don't know how to create the error table without using SPI.
Please pointer it out.copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.It seems that the error table is created when the SAVE_ERROR is used
for the first time. It probably blocks concurrent COPY FROM commands
with SAVE_ERROR option to different tables if the error table is not
created yet.I don't know how to solve this problem.... Maybe we can document this.
but it will block the COPY FROM immediately.if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.I don't think it works. If the user is dropped, the user's oid could
be reused for a different user.You are right.
so I changed, now the schema owner will be the error table owner.
every error table tuple inserts,
I switch to schema owner, do the insert, then switch back to the
COPY_FROM operation user.
now everyone (except superuser) will need explicit grant to access the
error table.
There are some compilation issues reported at [1]https://cirrus-ci.com/task/4785221183209472 for the patch:
[04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
[04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[04:04:26.288] 1127 | t_values,
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] 1128 | t_isnull);
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
[04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[1]: https://cirrus-ci.com/task/4785221183209472
Regards,
Vignesh
On Fri, Jan 5, 2024 at 12:05 AM vignesh C <vignesh21@gmail.com> wrote:
On Thu, 28 Dec 2023 at 09:27, jian he <jian.universality@gmail.com> wrote:
On Wed, Dec 20, 2023 at 8:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Why do we need to use SPI? I think we can form heap tuples and insert
them to the error table. Creating the error table also doesn't need to
use SPI.Thanks for pointing it out. I figured out how to form heap tuples and
insert them to the error table.
but I don't know how to create the error table without using SPI.
Please pointer it out.copy_errors one per schema.
foo.copy_errors will be owned by the schema: foo owner.It seems that the error table is created when the SAVE_ERROR is used
for the first time. It probably blocks concurrent COPY FROM commands
with SAVE_ERROR option to different tables if the error table is not
created yet.I don't know how to solve this problem.... Maybe we can document this.
but it will block the COPY FROM immediately.if you can insert to a table in that specific schema let's say foo,
then you will get privilege to INSERT/DELETE/SELECT
to foo.copy_errors.
If you are not a superuser, you are only allowed to do
INSERT/DELETE/SELECT on foo.copy_errors rows where USERID =
current_user::regrole::oid.
This is done via row level security.I don't think it works. If the user is dropped, the user's oid could
be reused for a different user.You are right.
so I changed, now the schema owner will be the error table owner.
every error table tuple inserts,
I switch to schema owner, do the insert, then switch back to the
COPY_FROM operation user.
now everyone (except superuser) will need explicit grant to access the
error table.There are some compilation issues reported at [1] for the patch:
[04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
[04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[04:04:26.288] 1127 | t_values,
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] 1128 | t_isnull);
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
[04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I fixed this issue, and also improved the doc.
Other implementations have not changed.
Attachments:
v14-0001-Make-COPY-FROM-more-error-tolerant.patchapplication/x-patch; name=v14-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From 99ebe03caa9d50b2cd3cdcd05becccd4b61684e1 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 5 Jan 2024 16:29:38 +0800
Subject: [PATCH v14 1/1] Make COPY FROM more error tolerant
At present, when processing the source file, COPY FROM may encounter three types of data type conversion errors.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error (boolean) specifier will
save errors to the table copy_errors for all the copy from operation happend in the same schema.
We check the existence of table copy_errors,
we also check the data definition of copy_errors via compare column names and data types.
If copy_errors already exists and meets the criteria then errors metadata will save to it.
If copy_errors does not exist, then create it.
If copy_errors exist, cannot use for saving error, then raise an error.
the table copy_errors is per schema-wise, it's owned by the copy from
operation destination schema's owner.
The table owner has full privilege on copy_errors,
other non-superuser need gain privilege to access it.
---
doc/src/sgml/ref/copy.sgml | 120 ++++++++++++-
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 133 +++++++++++++-
src/backend/commands/copyfromparse.c | 217 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 6 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 137 ++++++++++++++
src/test/regress/sql/copy2.sql | 123 +++++++++++++
11 files changed, 745 insertions(+), 16 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..f6cdf0cf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,18 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion errors while copying will automatically saved in table <literal>COPY_ERRORS</literal> and the <command>COPY FROM</command> operation will not be interrupted by conversion errors.
+ This option is not allowed when using <literal>binary</literal> format. This option
+ is only supported for <command>COPY FROM</command> syntax.
+ If this option is omitted, any data type conversion errors will be raised immediately.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -564,6 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
<command>VACUUM</command> to recover the wasted space.
+ To continue copying while skip conversion errors in a <command>COPY FROM</command>, you might wish to specify <literal>SAVE_ERROR</literal>.
</para>
<para>
@@ -572,6 +586,18 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is specified and conversion errors occur while copying,
+ <productname>PostgreSQL</productname> will first check for the existence of the table <literal>COPY_ERRORS</literal>, then save the conversion error information to it.
+ If it does exist, but the table definition cannot use it to save the error, an error is raised, <command>COPY FROM</command> operation stops.
+ If it does not exist, <productname>PostgreSQL</productname> will try to create it before doing the actual copy operation.
+ The table <literal>COPY_ERRORS</literal> owner is the current <command>COPY FROM</command> operation's schema owner.
+ All the future errors related information generated while copying data to the same schema will automatically be saved to the same <literal>COPY_ERRORS</literal> table.
+ Currenly only the owner can read and write data to <literal>COPY_ERRORS</literal>.
+ Conversion errors include data type conversion failure, extra data or missing data in the source file.
+ <literal>COPY_ERRORS</literal> table detailed description listed in <xref linkend="copy-errors-table"/>.
+
+ </para>
</refsect1>
<refsect1>
@@ -588,7 +614,7 @@ COPY <replaceable class="parameter">count</replaceable>
output function, or acceptable to the input function, of each
attribute's data type. The specified null string is used in
place of columns that are null.
- <command>COPY FROM</command> will raise an error if any line of the
+ By default, if <literal>SAVE_ERROR</literal> not specified, <command>COPY FROM</command> will raise an error if any line of the
input file contains more or fewer columns than are expected.
</para>
@@ -962,6 +988,98 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title> Table COPY_ERRORS </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> specified, all the data type conversion errors while copying will automatically saved in <literal>COPY_ERRORS</literal>.
+ <xref linkend="copy-errors-table"/> shows <literal>COPY_ERRORS</literal> table's column name, data type, and description.
+ </para>
+
+ <table id="copy-errors-table">
+ <title>Error Saving table description </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>userid</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The user generated the conversion error.
+ Refer <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_authid</literal>. If the correspond <structfield>oid</structfield> deleted in <literal>pg_authid</literal>, this value become stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>copy_destination</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The <command>COPY FROM</command> operation destination table oid.
+ Refer <link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_class</literal>. If the correspond <structfield>oid</structfield> deleted in <literal>pg_class</literal>, this value become stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the <command>COPY FROM</command> input</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where the error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>colname</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field where the error occurred</entry>
+ </row>
+
+ <row>
+ <entry> <literal>raw_field_value</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..a972ad87 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -29,7 +29,9 @@
#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
+#include "catalog/pg_authid.h"
#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
@@ -38,6 +40,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -52,6 +55,7 @@
#include "utils/portal.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
/*
* No more than this many tuples per CopyMultiInsertBuffer
@@ -655,7 +659,8 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +997,10 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1297,6 +1306,20 @@ CopyFrom(CopyFromState cstate)
ExecResetTupleTable(estate->es_tupleTable, false);
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->copy_errors_nspname);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%llu rows were skipped because of conversion error."
+ " Skipped rows saved to table %s.copy_errors",
+ (unsigned long long) cstate->error_rows_cnt,
+ cstate->copy_errors_nspname));
+ }
+ }
+
/* Allow the FDW to shut down */
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
@@ -1444,6 +1467,114 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ StringInfoData querybuf;
+ bool isnull;
+ bool copy_erros_table_ok;
+ Oid nsp_oid;
+ Oid save_userid;
+ Oid ownerId;
+ int save_sec_context;
+ const char *copy_errors_nspname;
+ HeapTuple tuple;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ copy_errors_nspname = get_namespace_name(RelationGetNamespace(cstate->rel));
+ nsp_oid = get_namespace_oid(copy_errors_nspname, false);
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the nsp_oid.COPY_ERRORS table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,userid,copy_destination,filename,lineno, "
+ "line,colname,raw_field_value,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,oid,oid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+ appendStringInfo(&querybuf,
+ "relname = $$copy_errors$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ copy_errors_nspname);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ copy_erros_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ tuple = SearchSysCache1(NAMESPACEOID, ObjectIdGetDatum(nsp_oid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_SCHEMA),
+ errmsg("schema with OID %u does not exist", nsp_oid)));
+ ownerId = ((Form_pg_namespace) GETSTRUCT(tuple))->nspowner;
+ ReleaseSysCache(tuple);
+
+ cstate->copy_errors_owner = ownerId;
+
+ /*
+ * Switch to the schema owner's userid, so that the COPY_ERRORS table owned by
+ * that user.
+ */
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+
+ SetUserIdAndSecContext(ownerId,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /* No copy_errors_nspname.COPY_ERRORS table then create it for holding all the potential error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.COPY_ERRORS( "
+ "USERID OID, COPY_DESTINATION OID, FILENAME TEXT,LINENO BIGINT "
+ ",LINE TEXT, COLNAME text, RAW_FIELD_VALUE TEXT "
+ ",ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT)", copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ }
+ else if(!copy_erros_table_ok)
+ ereport(ERROR,
+ (errmsg("table %s.COPY_ERRORS already exists. "
+ "cannot use it for COPY FROM error saving",
+ copy_errors_nspname)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->copy_errors_nspname = pstrdup(copy_errors_nspname);
+ }
+ else
+ {
+ cstate->copy_errors_nspname = NULL;
+ cstate->escontext = NULL;
+ cstate->copy_errors_owner = (Oid) 0;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..37f36ea0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -58,18 +58,21 @@
*/
#include "postgres.h"
+#include "access/heapam.h"
#include <ctype.h>
#include <unistd.h>
#include <sys/stat.h>
-
+#include <catalog/namespace.h>
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -880,16 +883,85 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ char *errmsg_extra;
+ Oid save_userid = InvalidOid;
+ int save_sec_context = -1;
+ HeapTuple copy_errors_tup;
+ Relation copy_errorsrel;
+ TupleDesc copy_errors_tupDesc;
+ Datum t_values[10] = {0};
+ bool t_isnull[10] = {0};
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ if (cstate->opts.save_error)
+ {
+ /*
+ * Open the copy_errors relation. we also need current userid for the later heap inserts.
+ *
+ */
+ copy_errorsrel = table_open(RelnameGetRelid("copy_errors"), RowExclusiveLock);
+ copy_errors_tupDesc = copy_errorsrel->rd_att;
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+ }
+
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ errmsg_extra = pstrdup("extra data after last expected column");
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(
+ cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(errmsg_extra);
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +973,55 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(
+ psprintf("missing data for column \"%s\"", NameStr(att->attname)));
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1073,91 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char *err_detail;
+ char *err_code;
+ err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = CStringGetTextDatum(cstate->cur_attname);
+ t_isnull[5] = false;
+ t_values[6] = CStringGetTextDatum(string);
+ t_isnull[6] = false;
+ t_values[7] = CStringGetTextDatum(cstate->escontext->error_data->message);
+ t_isnull[7] = false;
+ t_values[8] = err_detail ? CStringGetTextDatum(err_detail) : (Datum) 0;
+ t_isnull[8] = err_detail ? false: true;
+ t_values[9] = CStringGetTextDatum(err_code);
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
+ if (cstate->opts.save_error)
+ table_close(copy_errorsrel, RowExclusiveLock);
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4b175ef6..fc69420e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -778,7 +778,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3473,6 +3473,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17768,6 +17772,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -18395,6 +18400,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..aa560dbb 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..2c3b7b42 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,11 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ Oid copy_errors_owner; /* the owner of copy_errors table */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 error_rows_cnt; /* total number of rows that have errors */
+ const char *copy_errors_nspname; /* the copy_errors's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f88a6c9a..b6f7ed48 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -390,6 +390,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..a2c6bf5aa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,118 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY save_error_csv FROM STDIN WITH (save_error, save_error ...
+ ^
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: table public.COPY_ERRORS already exists. cannot use it for COPY FROM error saving
+drop table COPY_ERRORS;
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | b_null | b_empty
+---+---+------+--------+---------
+ 2 | | NULL | f | t
+(1 row)
+
+DROP TABLE save_error_csv;
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 10 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+ relname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+---------------+----------+--------+--------------------------------------------+---------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ check_ign_err | STDIN | 1 | 1 {1} 1 1 extra | NULL | NULL | extra data after last expected column | NULL | 22P04
+ check_ign_err | STDIN | 2 | 2 | NULL | NULL | missing data for column "m" | NULL | 22P04
+ check_ign_err | STDIN | 3 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | | " | |
+ check_ign_err | STDIN | 4 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 5 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ check_ign_err | STDIN | 6 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(16 rows)
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK to s1;
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+ relname | rolname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+-----------------+----------------+----------+--------+----------------------------+---------+-----------------+-------------------------------------------------------------------+------------------------------------------+-----------
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | c | a); | malformed range literal: "a);" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+--owner allowed to drop the table.
+drop table copy_errors;
+--should fail. no priviledge
+select * from public.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +934,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+ filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..a37986df 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,106 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+drop table COPY_ERRORS;
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+DROP TABLE save_error_csv;
+
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1 extra
+2
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a);
+\.
+
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+
+ROLLBACK to s1;
+
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+
+--owner allowed to drop the table.
+drop table copy_errors;
+
+--should fail. no priviledge
+select * from public.copy_errors;
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +709,26 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
On Fri, Jan 5, 2024 at 4:37 PM jian he <jian.universality@gmail.com> wrote:
be reused for a different user.
You are right.
so I changed, now the schema owner will be the error table owner.
every error table tuple inserts,
I switch to schema owner, do the insert, then switch back to the
COPY_FROM operation user.
now everyone (except superuser) will need explicit grant to access the
error table.There are some compilation issues reported at [1] for the patch:
[04:04:26.288] copyfromparse.c: In function ‘NextCopyFrom’:
[04:04:26.288] copyfromparse.c:1126:25: error: ‘copy_errors_tupDesc’
may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[04:04:26.288] 1126 | copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[04:04:26.288] 1127 | t_values,
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] 1128 | t_isnull);
[04:04:26.288] | ~~~~~~~~~
[04:04:26.288] copyfromparse.c:1160:4: error: ‘copy_errorsrel’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
[04:04:26.288] 1160 | table_close(copy_errorsrel, RowExclusiveLock);
[04:04:26.288] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I fixed this issue, and also improved the doc.
Other implementations have not changed.
bother again.
This time, I used the ci test it again.
now there should be no warning.
Attachments:
v15-0001-Make-COPY-FROM-more-error-tolerant.patchapplication/x-patch; name=v15-0001-Make-COPY-FROM-more-error-tolerant.patchDownload
From f033ef4025dbe2012007434dacd4821718443571 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 6 Jan 2024 02:34:22 +0800
Subject: [PATCH v14 1/1] Make COPY FROM more error tolerant
At present, when processing the source file, COPY FROM may encounter three types of data type conversion errors.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.
Instead of throwing errors while copying, save_error (boolean) specifier will
save errors to the table copy_errors for all the copy from operation happend in the same schema.
We check the existence of table copy_errors,
we also check the data definition of copy_errors via compare column names and data types.
If copy_errors already exists and meets the criteria then errors metadata will save to it.
If copy_errors does not exist, then create it.
If copy_errors exist, cannot use for saving error, then raise an error.
the table copy_errors is per schema-wise, it's owned by the copy from
operation destination schema's owner.
The table owner has full privilege on copy_errors,
other non-superuser need gain privilege to access it.
---
doc/src/sgml/ref/copy.sgml | 120 ++++++++++++-
src/backend/commands/copy.c | 12 ++
src/backend/commands/copyfrom.c | 133 +++++++++++++-
src/backend/commands/copyfromparse.c | 217 +++++++++++++++++++++--
src/backend/parser/gram.y | 8 +-
src/bin/psql/tab-complete.c | 3 +-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 6 +
src/include/parser/kwlist.h | 1 +
src/test/regress/expected/copy2.out | 137 ++++++++++++++
src/test/regress/sql/copy2.sql | 123 +++++++++++++
11 files changed, 745 insertions(+), 16 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c..f6cdf0cf 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
+ SAVE_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
</synopsis>
</refsynopsisdiv>
@@ -411,6 +412,18 @@ WHERE <replaceable class="parameter">condition</replaceable>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR</literal></term>
+ <listitem>
+ <para>
+ Specifies that any data conversion errors while copying will automatically saved in table <literal>COPY_ERRORS</literal> and the <command>COPY FROM</command> operation will not be interrupted by conversion errors.
+ This option is not allowed when using <literal>binary</literal> format. This option
+ is only supported for <command>COPY FROM</command> syntax.
+ If this option is omitted, any data type conversion errors will be raised immediately.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
@@ -564,6 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
<command>VACUUM</command> to recover the wasted space.
+ To continue copying while skip conversion errors in a <command>COPY FROM</command>, you might wish to specify <literal>SAVE_ERROR</literal>.
</para>
<para>
@@ -572,6 +586,18 @@ COPY <replaceable class="parameter">count</replaceable>
null strings to null values and unquoted null strings to empty strings.
</para>
+ <para>
+ If the <literal>SAVE_ERROR</literal> option is specified and conversion errors occur while copying,
+ <productname>PostgreSQL</productname> will first check for the existence of the table <literal>COPY_ERRORS</literal>, then save the conversion error information to it.
+ If it does exist, but the table definition cannot use it to save the error, an error is raised, <command>COPY FROM</command> operation stops.
+ If it does not exist, <productname>PostgreSQL</productname> will try to create it before doing the actual copy operation.
+ The table <literal>COPY_ERRORS</literal> owner is the current <command>COPY FROM</command> operation's schema owner.
+ All the future errors related information generated while copying data to the same schema will automatically be saved to the same <literal>COPY_ERRORS</literal> table.
+ Currenly only the owner can read and write data to <literal>COPY_ERRORS</literal>.
+ Conversion errors include data type conversion failure, extra data or missing data in the source file.
+ <literal>COPY_ERRORS</literal> table detailed description listed in <xref linkend="copy-errors-table"/>.
+
+ </para>
</refsect1>
<refsect1>
@@ -588,7 +614,7 @@ COPY <replaceable class="parameter">count</replaceable>
output function, or acceptable to the input function, of each
attribute's data type. The specified null string is used in
place of columns that are null.
- <command>COPY FROM</command> will raise an error if any line of the
+ By default, if <literal>SAVE_ERROR</literal> not specified, <command>COPY FROM</command> will raise an error if any line of the
input file contains more or fewer columns than are expected.
</para>
@@ -962,6 +988,98 @@ versions of <productname>PostgreSQL</productname>.
check against somehow getting out of sync with the data.
</para>
</refsect3>
+
+ <refsect3>
+ <title> Table COPY_ERRORS </title>
+ <para>
+ If <literal>SAVE_ERROR</literal> specified, all the data type conversion errors while copying will automatically saved in <literal>COPY_ERRORS</literal>.
+ <xref linkend="copy-errors-table"/> shows <literal>COPY_ERRORS</literal> table's column name, data type, and description.
+ </para>
+
+ <table id="copy-errors-table">
+ <title>Error Saving table description </title>
+
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column name</entry>
+ <entry>Data type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry> <literal>userid</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The user generated the conversion error.
+ Refer <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_authid</literal>. If the correspond <structfield>oid</structfield> deleted in <literal>pg_authid</literal>, this value become stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>copy_destination</literal> </entry>
+ <entry><type>oid</type></entry>
+ <entry>The <command>COPY FROM</command> operation destination table oid.
+ Refer <link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>.
+ There is no hard depenedency with <literal>pg_class</literal>. If the correspond <structfield>oid</structfield> deleted in <literal>pg_class</literal>, this value become stale.
+ </entry>
+ </row>
+
+ <row>
+ <entry> <literal>filename</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The path name of the <command>COPY FROM</command> input</entry>
+ </row>
+
+ <row>
+ <entry> <literal>lineno</literal> </entry>
+ <entry><type>bigint</type></entry>
+ <entry>Line number where the error occurred, counting from 1</entry>
+ </row>
+
+ <row>
+ <entry> <literal>line</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred line</entry>
+ </row>
+
+ <row>
+ <entry> <literal>colname</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Field where the error occurred</entry>
+ </row>
+
+ <row>
+ <entry> <literal>raw_field_value</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Raw content of the error occurred field</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_message </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error message</entry>
+ </row>
+
+ <row>
+ <entry> <literal>err_detail</literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>Detailed error message </entry>
+ </row>
+
+ <row>
+ <entry> <literal>errorcode </literal> </entry>
+ <entry><type>text</type></entry>
+ <entry>The error code </entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </refsect3>
+
</refsect2>
</refsect1>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b5..bc4af10a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
freeze_specified = true;
opts_out->freeze = defGetBoolean(defel);
}
+ else if (strcmp(defel->defname, "save_error") == 0)
+ {
+ if (save_error_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_specified = true;
+ opts_out->save_error = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "delimiter") == 0)
{
if (opts_out->delim)
@@ -598,6 +606,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f4861652..a972ad87 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -29,7 +29,9 @@
#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
+#include "catalog/pg_authid.h"
#include "catalog/namespace.h"
+#include "catalog/pg_namespace.h"
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
@@ -38,6 +40,7 @@
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
#include "executor/tuptable.h"
+#include "executor/spi.h"
#include "foreign/fdwapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -52,6 +55,7 @@
#include "utils/portal.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
+#include "utils/syscache.h"
/*
* No more than this many tuples per CopyMultiInsertBuffer
@@ -655,7 +659,8 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
-
+ if (cstate->opts.save_error)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +997,10 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ /* Soft error occured, skip this tuple. */
+ if (cstate->opts.save_error && cstate->line_error_occured)
+ continue;
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1297,6 +1306,20 @@ CopyFrom(CopyFromState cstate)
ExecResetTupleTable(estate->es_tupleTable, false);
+ if (cstate->opts.save_error)
+ {
+ Assert(cstate->copy_errors_nspname);
+
+ if (cstate->error_rows_cnt > 0)
+ {
+ ereport(NOTICE,
+ errmsg("%llu rows were skipped because of conversion error."
+ " Skipped rows saved to table %s.copy_errors",
+ (unsigned long long) cstate->error_rows_cnt,
+ cstate->copy_errors_nspname));
+ }
+ }
+
/* Allow the FDW to shut down */
if (target_resultRelInfo->ri_FdwRoutine != NULL &&
target_resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
@@ -1444,6 +1467,114 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR */
+ if (cstate->opts.save_error)
+ {
+ StringInfoData querybuf;
+ bool isnull;
+ bool copy_erros_table_ok;
+ Oid nsp_oid;
+ Oid save_userid;
+ Oid ownerId;
+ int save_sec_context;
+ const char *copy_errors_nspname;
+ HeapTuple tuple;
+
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = true;
+ cstate->escontext->error_occurred = false;
+
+ copy_errors_nspname = get_namespace_name(RelationGetNamespace(cstate->rel));
+ nsp_oid = get_namespace_oid(copy_errors_nspname, false);
+
+ initStringInfo(&querybuf);
+ /*
+ *
+ * Verify whether the nsp_oid.COPY_ERRORS table already exists, and if so,
+ * examine its column names and data types.
+ */
+ appendStringInfo(&querybuf,
+ "SELECT (array_agg(pa.attname ORDER BY pa.attnum) "
+ "= '{ctid,userid,copy_destination,filename,lineno, "
+ "line,colname,raw_field_value,err_message,err_detail,errorcode}') "
+ "AND (ARRAY_AGG(pt.typname ORDER BY pa.attnum) "
+ "= '{tid,oid,oid,text,int8,text,text,text,text,text,text}') "
+ "FROM pg_catalog.pg_attribute pa "
+ "JOIN pg_catalog.pg_class pc ON pc.oid = pa.attrelid "
+ "JOIN pg_catalog.pg_type pt ON pt.oid = pa.atttypid "
+ "JOIN pg_catalog.pg_namespace pn "
+ "ON pn.oid = pc.relnamespace WHERE ");
+ appendStringInfo(&querybuf,
+ "relname = $$copy_errors$$ AND pn.nspname = $$%s$$ "
+ " AND pa.attnum >= -1 AND NOT attisdropped ",
+ copy_errors_nspname);
+
+ if (SPI_connect() != SPI_OK_CONNECT)
+ elog(ERROR, "SPI_connect failed");
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_SELECT)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ copy_erros_table_ok = DatumGetBool(SPI_getbinval(SPI_tuptable->vals[0],
+ SPI_tuptable->tupdesc,
+ 1, &isnull));
+
+ tuple = SearchSysCache1(NAMESPACEOID, ObjectIdGetDatum(nsp_oid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_SCHEMA),
+ errmsg("schema with OID %u does not exist", nsp_oid)));
+ ownerId = ((Form_pg_namespace) GETSTRUCT(tuple))->nspowner;
+ ReleaseSysCache(tuple);
+
+ cstate->copy_errors_owner = ownerId;
+
+ /*
+ * Switch to the schema owner's userid, so that the COPY_ERRORS table owned by
+ * that user.
+ */
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+
+ SetUserIdAndSecContext(ownerId,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /* No copy_errors_nspname.COPY_ERRORS table then create it for holding all the potential error. */
+ if (isnull)
+ {
+ resetStringInfo(&querybuf);
+ appendStringInfo(&querybuf,
+ "CREATE TABLE %s.COPY_ERRORS( "
+ "USERID OID, COPY_DESTINATION OID, FILENAME TEXT,LINENO BIGINT "
+ ",LINE TEXT, COLNAME text, RAW_FIELD_VALUE TEXT "
+ ",ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT)", copy_errors_nspname);
+
+ if (SPI_execute(querybuf.data, false, 0) != SPI_OK_UTILITY)
+ elog(ERROR, "SPI_exec failed: %s", querybuf.data);
+ }
+ else if(!copy_erros_table_ok)
+ ereport(ERROR,
+ (errmsg("table %s.COPY_ERRORS already exists. "
+ "cannot use it for COPY FROM error saving",
+ copy_errors_nspname)));
+
+ if (SPI_finish() != SPI_OK_FINISH)
+ elog(ERROR, "SPI_finish failed");
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->copy_errors_nspname = pstrdup(copy_errors_nspname);
+ }
+ else
+ {
+ cstate->copy_errors_nspname = NULL;
+ cstate->escontext = NULL;
+ cstate->copy_errors_owner = (Oid) 0;
+ }
+
+ cstate->error_rows_cnt = 0; /* set the default to 0 */
+ cstate->line_error_occured = false; /* default, assume conversion be ok. */
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5537345..ac204709 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -58,18 +58,21 @@
*/
#include "postgres.h"
+#include "access/heapam.h"
#include <ctype.h>
#include <unistd.h>
#include <sys/stat.h>
-
+#include <catalog/namespace.h>
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/progress.h"
#include "executor/executor.h"
+#include "executor/spi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -880,16 +883,85 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
int fldct;
int fieldno;
char *string;
+ char *errmsg_extra;
+ Oid save_userid = InvalidOid;
+ int save_sec_context = -1;
+ HeapTuple copy_errors_tup = NULL;
+ Relation copy_errorsrel = NULL;
+ TupleDesc copy_errors_tupDesc = NULL;
+ Datum t_values[10] = {0};
+ bool t_isnull[10] = {0};
/* read raw fields in the next line */
if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
return false;
+ if (cstate->opts.save_error)
+ {
+ /*
+ * Open the copy_errors relation. we also need current userid for the later heap inserts.
+ *
+ */
+ copy_errorsrel = table_open(RelnameGetRelid("copy_errors"), RowExclusiveLock);
+ copy_errors_tupDesc = copy_errorsrel->rd_att;
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+ }
+
+ /* reset line_error_occured to false for next new line. */
+ if (cstate->line_error_occured)
+ cstate->line_error_occured = false;
+
/* check for overflowing fields */
if (attr_count > 0 && fldct > attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ {
+ if(cstate->opts.save_error)
+ {
+ errmsg_extra = pstrdup("extra data after last expected column");
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(
+ cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(errmsg_extra);
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
+ }
fieldno = 0;
@@ -901,10 +973,55 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
Form_pg_attribute att = TupleDescAttr(tupDesc, m);
if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
+ {
+ if(cstate->opts.save_error)
+ {
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = (Datum) 0;
+ t_isnull[5] = true;
+ t_values[6] = (Datum) 0;
+ t_isnull[6] = true;
+ t_values[7] = CStringGetTextDatum(
+ psprintf("missing data for column \"%s\"", NameStr(att->attname)));
+ t_isnull[7] = false;
+ t_values[8] = (Datum) 0;
+ t_isnull[8] = true;
+ t_values[9] = CStringGetTextDatum(
+ unpack_sql_state(ERRCODE_BAD_COPY_FILE_FORMAT));
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+ cstate->line_error_occured = true;
+ cstate->error_rows_cnt++;
+ table_close(copy_errorsrel, RowExclusiveLock);
+ return true;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ }
+
string = field_strings[fieldno++];
if (cstate->convert_select_flags &&
@@ -956,15 +1073,91 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ {
+ /*
+ *
+ * InputFunctionCall is more faster than InputFunctionCallSafe.
+ *
+ */
+ if(!cstate->opts.save_error)
+ values[m] = InputFunctionCall(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod);
+ else
+ {
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ char *err_detail;
+ char *err_code;
+ err_code = pstrdup(unpack_sql_state(cstate->escontext->error_data->sqlerrcode));
+ if (!cstate->escontext->error_data->detail)
+ err_detail = NULL;
+ else
+ err_detail = cstate->escontext->error_data->detail;
+
+ t_values[0] = ObjectIdGetDatum(save_userid);
+ t_isnull[0] = false;
+ t_values[1] = ObjectIdGetDatum(cstate->rel->rd_rel->oid);
+ t_isnull[1] = false;
+ t_values[2] = CStringGetTextDatum(cstate->filename ? cstate->filename : "STDIN");
+ t_isnull[2] = false;
+ t_values[3] = Int64GetDatum((long long) cstate->cur_lineno);
+ t_isnull[3] = false;
+ t_values[4] = CStringGetTextDatum(cstate->line_buf.data);
+ t_isnull[4] = false;
+ t_values[5] = CStringGetTextDatum(cstate->cur_attname);
+ t_isnull[5] = false;
+ t_values[6] = CStringGetTextDatum(string);
+ t_isnull[6] = false;
+ t_values[7] = CStringGetTextDatum(cstate->escontext->error_data->message);
+ t_isnull[7] = false;
+ t_values[8] = err_detail ? CStringGetTextDatum(err_detail) : (Datum) 0;
+ t_isnull[8] = err_detail ? false: true;
+ t_values[9] = CStringGetTextDatum(err_code);
+ t_isnull[9] = false;
+
+ copy_errors_tup = heap_form_tuple(copy_errors_tupDesc,
+ t_values,
+ t_isnull);
+ /* using copy_errors owner do the simple_heap_insert */
+ SetUserIdAndSecContext(cstate->copy_errors_owner,
+ save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ simple_heap_insert(copy_errorsrel, copy_errors_tup);
+
+ /* Restore userid and security context */
+ SetUserIdAndSecContext(save_userid, save_sec_context);
+
+ /* line error occured, set it once per line */
+ if (!cstate->line_error_occured)
+ cstate->line_error_occured = true;
+ /* reset ErrorSaveContext */
+ cstate->escontext->error_occurred = false;
+ cstate->escontext->details_wanted = true;
+ memset(cstate->escontext->error_data,0, sizeof(ErrorData));
+ }
+ }
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
+ /* record error rows count. */
+ if (cstate->line_error_occured)
+ {
+ cstate->error_rows_cnt++;
+ Assert(cstate->opts.save_error);
+ }
+ if (cstate->opts.save_error)
+ table_close(copy_errorsrel, RowExclusiveLock);
Assert(fieldno == attr_count);
}
else
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4b175ef6..fc69420e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -778,7 +778,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
SEQUENCE SEQUENCES
SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
@@ -3473,6 +3473,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | SAVE_ERROR
+ {
+ $$ = makeDefElem("save_error", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
@@ -17768,6 +17772,7 @@ unreserved_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
@@ -18395,6 +18400,7 @@ bare_label_keyword:
| ROWS
| RULE
| SAVEPOINT
+ | SAVE_ERROR
| SCALAR
| SCHEMA
| SCHEMAS
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 04980118..e6a358e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2890,7 +2890,8 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b9..aa560dbb 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -43,6 +43,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool save_error; /* save error to a table? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5ec41589..2c3b7b42 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,11 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ Oid copy_errors_owner; /* the owner of copy_errors table */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 error_rows_cnt; /* total number of rows that have errors */
+ const char *copy_errors_nspname; /* the copy_errors's namespace */
+ bool line_error_occured; /* does this line conversion error happened */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f88a6c9a..b6f7ed48 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -390,6 +390,7 @@ PG_KEYWORD("routines", ROUTINES, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("row", ROW, COL_NAME_KEYWORD, BARE_LABEL)
PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("save_error", SAVE_ERROR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("scalar", SCALAR, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c..a2c6bf5aa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -564,6 +564,118 @@ ERROR: conflicting or redundant options
LINE 1: ... b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL...
^
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+ERROR: cannot specify SAVE_ERROR in BINARY mode
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+ERROR: conflicting or redundant options
+LINE 1: COPY save_error_csv FROM STDIN WITH (save_error, save_error ...
+ ^
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+ERROR: table public.COPY_ERRORS already exists. cannot use it for COPY FROM error saving
+drop table COPY_ERRORS;
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+ a | b | c | b_null | b_empty
+---+---+------+--------+---------
+ 2 | | NULL | f | t
+(1 row)
+
+DROP TABLE save_error_csv;
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+NOTICE: 10 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+ relname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+---------------+----------+--------+--------------------------------------------+---------+-------------------------+-----------------------------------------------------------------+---------------------------+-----------
+ check_ign_err | STDIN | 1 | 1 {1} 1 1 extra | NULL | NULL | extra data after last expected column | NULL | 22P04
+ check_ign_err | STDIN | 2 | 2 | NULL | NULL | missing data for column "m" | NULL | 22P04
+ check_ign_err | STDIN | 3 | \n {1} 1 \- | n | +| invalid input syntax for type integer: " +| NULL | 22P02
+ | | | | | | " | |
+ check_ign_err | STDIN | 4 | a {2} 2 \r | n | a | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 5 | 3 {\3} 3333333333 \n | m | {\x03} | invalid input syntax for type integer: "\x03" | NULL | 22P02
+ check_ign_err | STDIN | 6 | 0x11 {3,} 3333333333 \\. | m | {3,} | malformed array literal: "{3,}" | Unexpected "}" character. | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | n | d | invalid input syntax for type integer: "d" | NULL | 22P02
+ check_ign_err | STDIN | 7 | d {3,1/} 3333333333 \\0 | m | {3,1/} | invalid input syntax for type integer: "1/" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | n | e | invalid input syntax for type integer: "e" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | m | {3,\x01} | invalid input syntax for type integer: "\x01" | NULL | 22P02
+ check_ign_err | STDIN | 8 | e {3,\1} -3323879289873933333333 \n | k | -3323879289873933333333 | value "-3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | n | f | invalid input syntax for type integer: "f" | NULL | 22P02
+ check_ign_err | STDIN | 9 | f {3,1} 3323879289873933333333 \r | k | 3323879289873933333333 | value "3323879289873933333333" is out of range for type bigint | NULL | 22003
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | n | b | invalid input syntax for type integer: "b" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | m | {a, 4} | invalid input syntax for type integer: "a" | NULL | 22P02
+ check_ign_err | STDIN | 10 | b {a, 4} 1.1 h | k | 1.1 | invalid input syntax for type bigint: "1.1" | NULL | 22P02
+(16 rows)
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK to s1;
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+NOTICE: 2 rows were skipped because of conversion error. Skipped rows saved to table copy_errors_test.copy_errors
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+ relname | rolname | filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+-----------------+----------------+----------+--------+----------------------------+---------+-----------------+-------------------------------------------------------------------+------------------------------------------+-----------
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | b | -[a\,z) | malformed range literal: "-[a\,z)" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 1 | ,-[a\","z),[a","-inf) | c | [a,-inf) | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | a | (,a),( | malformed range literal: "(,a),(" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | b | ,a),() | malformed range literal: ",a),()" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user13 | STDIN | 2 | (",a),(",",a),()",a); | c | a); | malformed range literal: "a);" | Missing left parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | a | (a,)) | malformed range literal: "(a,))" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | b | (],a) | malformed range literal: "(],a)" | Missing comma after lower bound. | 22P02
+ textrange_input | regress_user12 | STDIN | 1 | (a",")),(]","a),(a","]) | c | (a,]) | malformed range literal: "(a,])" | Junk after right parenthesis or bracket. | 22P02
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | a | [z,a] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | b | [z,2] | range lower bound must be less than or equal to range upper bound | NULL | 22000
+ textrange_input | regress_user12 | STDIN | 2 | [z","a],[z","2],[(","",")] | c | [(,",)] | malformed range literal: "[(,",)]" | Unexpected end of input. | 22P02
+(11 rows)
+
+--owner allowed to drop the table.
+drop table copy_errors;
+--should fail. no priviledge
+select * from public.copy_errors;
+ERROR: permission denied for table copy_errors
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
create table check_con_tbl (f1 int);
@@ -822,3 +934,28 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
ERROR: COPY DEFAULT only available using COPY FROM
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+NOTICE: 3 rows were skipped because of conversion error. Skipped rows saved to table public.copy_errors
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+ filename | lineno | line | colname | raw_field_value | err_message | err_detail | errorcode
+----------+--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
+ STDIN | 1 | k value '2022-07-04' | id | k | invalid input syntax for type integer: "k" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | id | z | invalid input syntax for type integer: "z" | | 22P02
+ STDIN | 2 | z \D '2022-07-03ASKL' | ts_value | '2022-07-03ASKL' | invalid input syntax for type timestamp: "'2022-07-03ASKL'" | | 22007
+ STDIN | 3 | s \D \D | id | s | invalid input syntax for type integer: "s" | | 22P02
+(4 rows)
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60..a37986df 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -374,6 +374,106 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NULL *, FORCE_NULL(b));
ROLLBACK;
+--
+-- tests for SAVE_ERROR option with force_not_null, force_null
+\pset null NULL
+CREATE TABLE save_error_csv(
+ a INT NOT NULL,
+ b TEXT NOT NULL,
+ c TEXT
+);
+
+--save_error not allowed in binary mode
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT binary);
+
+-- redundant options not allowed.
+COPY save_error_csv FROM STDIN WITH (save_error, save_error off);
+
+create table COPY_ERRORS();
+--should fail. since table COPY_ERRORS already exists.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error);
+
+drop table COPY_ERRORS;
+
+--with FORCE_NOT_NULL and FORCE_NULL.
+COPY save_error_csv (a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+z,,""
+\0,,
+2,,
+\.
+
+SELECT *, b is null as b_null, b = '' as b_empty FROM save_error_csv;
+DROP TABLE save_error_csv;
+
+-- save error with extra data and missing data some column.
+---normal data type conversion error case.
+CREATE TABLE check_ign_err (n int, m int[], k bigint, l text);
+COPY check_ign_err FROM STDIN WITH (save_error);
+1 {1} 1 1 extra
+2
+\n {1} 1 \-
+a {2} 2 \r
+3 {\3} 3333333333 \n
+0x11 {3,} 3333333333 \\.
+d {3,1/} 3333333333 \\0
+e {3,\1} -3323879289873933333333 \n
+f {3,1} 3323879289873933333333 \r
+b {a, 4} 1.1 h
+5 {5} 5 \\
+\.
+
+select pc.relname, ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+from copy_errors ce join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'check_ign_err';
+
+DROP TABLE check_ign_err;
+truncate COPY_ERRORS;
+
+--(type textrange was already made in test_setup.sql)
+--using textrange doing test
+begin;
+CREATE USER regress_user12;
+CREATE USER regress_user13;
+CREATE SCHEMA IF NOT EXISTS copy_errors_test AUTHORIZATION regress_user12;
+SET LOCAL search_path TO copy_errors_test;
+
+GRANT USAGE on schema copy_errors_test to regress_user12,regress_user13;
+GRANT CREATE on schema copy_errors_test to regress_user12;
+set role regress_user12;
+CREATE TABLE textrange_input(a public.textrange, b public.textrange, c public.textrange);
+GRANT insert on textrange_input to regress_user13;
+
+set role regress_user13;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+,-[a\","z),[a","-inf)
+(",a),(",",a),()",a);
+\.
+
+SAVEPOINT s1;
+--should fail. no priviledge
+select * from copy_errors_test.copy_errors;
+
+ROLLBACK to s1;
+
+set role regress_user12;
+COPY textrange_input(a, b, c) FROM STDIN WITH (save_error,FORMAT csv, FORCE_NULL *);
+(a",")),(]","a),(a","])
+[z","a],[z","2],[(","",")]
+\.
+
+SELECT pc.relname,pr.rolname,ce.filename,ce.lineno,ce.line,ce.colname,
+ ce.raw_field_value,ce.err_message,ce.err_detail,ce.errorcode
+FROM copy_errors_test.copy_errors ce
+JOIN pg_class pc ON pc.oid = ce.copy_destination
+JOIN pg_roles pr ON pr.oid = ce.userid;
+
+--owner allowed to drop the table.
+drop table copy_errors;
+
+--should fail. no priviledge
+select * from public.copy_errors;
+ROLLBACK;
\pset null ''
-- test case with whole-row Var in a check constraint
@@ -609,3 +709,26 @@ truncate copy_default;
-- DEFAULT cannot be used in COPY TO
copy (select 1 as test) TO stdout with (default '\D');
+
+-- DEFAULT WITH SAVE_ERROR.
+create table copy_default_error_save (
+ id integer,
+ text_value text not null default 'test',
+ ts_value timestamp without time zone not null default '2022-07-05'
+);
+copy copy_default_error_save from stdin with (save_error, default '\D');
+k value '2022-07-04'
+z \D '2022-07-03ASKL'
+s \D \D
+\.
+
+select ce.filename,ce.lineno,ce.line,
+ ce.colname, ce.raw_field_value,
+ ce.err_message, ce.err_detail,ce.errorcode
+from public.copy_errors ce
+join pg_class pc on pc.oid = ce.copy_destination
+where pc.relname = 'copy_default_error_save'
+order by lineno, colname;
+
+drop table copy_default_error_save, copy_errors;
+truncate copy_default;
\ No newline at end of file
--
2.34.1
On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.
Jian, how do you think this comment?
Looking back at the discussion so far, it seems that not everyone thinks
saving table information is the best idea[1]/messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de and some people think just
skipping error data is useful.[2]/messages/by-id/CAD21AoCeEOBN49fu43e6tBTynnswugA3oZ5AZvLeyDCpxpCXPg@mail.gmail.com
Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.
Wouldn't it be better to follow the following advice and develop the
functionality incrementally?
On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> wrote:
So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.
Attached a patch for this "first step" with reference to v7 patch, which
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only supports
'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.
BTW I have question and comment about v15 patch:
+ { + /* + * + * InputFunctionCall is more faster than InputFunctionCallSafe. + * + */
Have you measured this?
When I tested it in an older patch, there were no big difference[3]/messages/by-id/19551e8c2717c24689913083f841ddb5@oss.nttdata.com.
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY
SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
SECURITY SELECT
There was a comment that we shouldn't add new keyword for this[4]/messages/by-id/20230322175000.qbdctk7bnmifh5an@awork3.anarazel.de.
I left as it was in v7 patch regarding these points.
[1]: /messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
/messages/by-id/20231109002600.fuihn34bjqqgmbjm@awork3.anarazel.de
[2]: /messages/by-id/CAD21AoCeEOBN49fu43e6tBTynnswugA3oZ5AZvLeyDCpxpCXPg@mail.gmail.com
/messages/by-id/CAD21AoCeEOBN49fu43e6tBTynnswugA3oZ5AZvLeyDCpxpCXPg@mail.gmail.com
[3]: /messages/by-id/19551e8c2717c24689913083f841ddb5@oss.nttdata.com
/messages/by-id/19551e8c2717c24689913083f841ddb5@oss.nttdata.com
[4]: /messages/by-id/20230322175000.qbdctk7bnmifh5an@awork3.anarazel.de
/messages/by-id/20230322175000.qbdctk7bnmifh5an@awork3.anarazel.de
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patchtext/x-diff; name=v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patchDownload
From 675b8b8408e23f22940a99b40cb7ec3e1b36cac3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 9 Jan 2024 23:10:14 +0900
Subject: [PATCH v1] Add new COPY option SAVE_ERROR_TO
Currently when source data contains unexpected data regarding data type or
range, entire COPY fails. However, in some cases such data can be ignored and
just copying normal data is preferable.
This patch adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying data.
Currently SAVE_ERROR_TO only supports 'none'. This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, referenced with jian he's patch.
---
doc/src/sgml/ref/copy.sgml | 20 +++++++++++++-
src/backend/commands/copy.c | 19 ++++++++++++++
src/backend/commands/copyfrom.c | 33 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 16 +++++++++---
src/bin/psql/tab-complete.c | 7 ++++-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 +++
src/test/regress/expected/copy2.out | 28 ++++++++++++++++++++
src/test/regress/sql/copy2.sql | 27 +++++++++++++++++++
9 files changed, 148 insertions(+), 6 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c33..87f2b3e7a2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies where to save error information when there are malformed data in
+ the input. If this option is specified, <command>COPY</command> skips
+ malformed data and continues copying data.
+ Currently only <literal>none</literal> is supported.
+ This option is allowed only in <command>COPY FROM</command>, and only when
+ not using <literal>binary</literal> format.
+ Note that this is only supported in current <command>COPY</command>
+ syntax.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +573,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d7..5e5e8a5f34 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -571,6 +571,20 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ char *location = defGetString(defel);
+
+ if (opts_out->save_error_to)
+ errorConflictingDefElem(defel, pstate);
+ else if (strcmp(location, "none") == 0)
+ opts_out->save_error_to = location;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", location),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +612,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c..d909123cd1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -752,6 +753,14 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to)
+ {
+ ErrorSaveContext escontext = {T_ErrorSaveContext};
+ escontext.details_wanted = true;
+ cstate->escontext = escontext;
+ }
+
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -992,6 +1001,25 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (cstate->escontext.error_occurred)
+ {
+ ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+ /* Currently only "none" is supported */
+ Assert(strcmp(cstate->opts.save_error_to, "none") == 0);
+
+ ExecClearTuple(myslot);
+
+ new_escontext.details_wanted = true;
+ cstate->escontext = new_escontext;
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1281,6 +1309,11 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
}
+ if (cstate->opts.save_error_to && cstate->num_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f645..0dd49d85e6 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -956,10 +957,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) &cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e4..efe2b7cc10 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2f..f890b66f26 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ char *save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a907..e2a8f9dd6e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ uint64 num_errors; /* total number of rows which contained soft errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07..4a1777a4fa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +96,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY save_error_to "unsupported" not recognized
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +716,26 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +750,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f6086..17c5764b42 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,12 +70,14 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +496,29 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +533,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
base-commit: d596736a499858de800cabb241c0107c978f1b95
--
2.39.2
On Tue, Jan 9, 2024 at 11:36 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.Jian, how do you think this comment?
Looking back at the discussion so far, it seems that not everyone thinks
saving table information is the best idea[1] and some people think just
skipping error data is useful.[2]Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.Wouldn't it be better to follow the following advice and develop the
functionality incrementally?
Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to
implement "error tables" feature.
On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> wrote:So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Attached a patch for this "first step" with reference to v7 patch, which
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only supports
'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.
Seems promising. I'll look at the patch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Jan 9, 2024 at 10:36 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.Jian, how do you think this comment?
Looking back at the discussion so far, it seems that not everyone thinks
saving table information is the best idea[1] and some people think just
skipping error data is useful.[2]Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.Wouldn't it be better to follow the following advice and develop the
functionality incrementally?On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> wrote:So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Attached a patch for this "first step" with reference to v7 patch, which
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only supports
'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.
Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)
* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState field.
* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext *escontext.
So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, <command>COPY</command>
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
so we should set cstate->escontext->details_wanted to false.
BTW I have question and comment about v15 patch:
+ { + /* + * + * InputFunctionCall is more faster than InputFunctionCallSafe. + * + */Have you measured this?
When I tested it in an older patch, there were no big difference[3].
Thanks for pointing it out, I probably was over thinking.
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY
SELECT
+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
SECURITY SELECT
There was a comment that we shouldn't add new keyword for this[4].
Thanks for pointing it out.
Attachments:
v1-0001-minor-refactor.no-cfbotapplication/octet-stream; name=v1-0001-minor-refactor.no-cfbotDownload
From 24583b558aa6be0d8a3cabffc9a9ce33c7af856b Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 11 Jan 2024 10:57:18 +0800
Subject: [PATCH v1 1/1] minor refactor.
slightly changed the doc.
other misc changes.
---
doc/src/sgml/ref/copy.sgml | 6 ++--
src/backend/commands/copyfrom.c | 45 ++++++++++++++----------
src/backend/commands/copyfromparse.c | 3 +-
src/include/commands/copyfrom_internal.h | 2 +-
4 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 87f2b3e7..a280e825 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -378,14 +378,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<term><literal>SAVE_ERROR_TO</literal></term>
<listitem>
<para>
- Specifies where to save error information when there are malformed data in
+ Specifies save error information to <replaceable
+ class="parameter">location</replaceable> when there are malformed data in
the input. If this option is specified, <command>COPY</command> skips
malformed data and continues copying data.
Currently only <literal>none</literal> is supported.
+ If this option is omitted, <command>COPY</command> stops operation at the first error.
This option is allowed only in <command>COPY FROM</command>, and only when
not using <literal>binary</literal> format.
- Note that this is only supported in current <command>COPY</command>
- syntax.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index d909123c..698e3ef6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,6 +657,8 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to)
+ Assert(cstate->escontext);
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -753,14 +755,6 @@ CopyFrom(CopyFromState cstate)
ti_options |= TABLE_INSERT_FROZEN;
}
- /* Set up soft error handler for SAVE_ERROR_TO */
- if (cstate->opts.save_error_to)
- {
- ErrorSaveContext escontext = {T_ErrorSaveContext};
- escontext.details_wanted = true;
- cstate->escontext = escontext;
- }
-
/*
* We need a ResultRelInfo so we can use the regular executor's
* index-entry-making machinery. (There used to be a huge amount of code
@@ -1005,18 +999,16 @@ CopyFrom(CopyFromState cstate)
* Soft error occured, skip this tuple and save error information
* according to SAVE_ERROR_TO.
*/
- if (cstate->escontext.error_occurred)
+ if (cstate->opts.save_error_to && cstate->escontext->error_occurred)
{
- ErrorSaveContext new_escontext = {T_ErrorSaveContext};
-
- /* Currently only "none" is supported */
+ /*
+ * Currently we only "none" is supported.
+ * make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we never set details_wanted, we don't need to also reset error_data.
+ *
+ */
Assert(strcmp(cstate->opts.save_error_to, "none") == 0);
-
- ExecClearTuple(myslot);
-
- new_escontext.details_wanted = true;
- cstate->escontext = new_escontext;
-
+ cstate->escontext->error_occurred = false;
continue;
}
@@ -1477,6 +1469,23 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO
+ * currenly we can only save_error to 'none',
+ * We can add other options later, but we need set the escontext properly.
+ */
+ if (cstate->opts.save_error_to && strcmp(cstate->opts.save_error_to, "none") == 0)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->details_wanted = false;
+ cstate->escontext->error_occurred = false;
+ }
+ else
+ {
+ cstate->escontext = NULL;
+ }
+ cstate->num_errors = 0;
+
/* Convert convert_selectively name list to per-column flags */
if (cstate->opts.convert_selectively)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0dd49d85..6b5d9e39 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -962,10 +962,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
string,
typioparams[m],
att->atttypmod,
- (Node *) &cstate->escontext,
+ (Node *) cstate->escontext,
&values[m]))
{
cstate->num_errors++;
+ Assert(!cstate->escontext->details_wanted);
return true;
}
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e2a8f9dd..ddcc04f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -95,7 +95,7 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
- ErrorSaveContext escontext; /* soft error trapper during in_functions execution */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
uint64 num_errors; /* total number of rows which contained soft errors */
int *defmap; /* array of default att numbers related to
* missing att */
--
2.34.1
On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to
implement "error tables" feature.
+1.
I'm also going to make patch for "logging errors", since this
functionality is isolated from v7 patch.
Seems promising. I'll look at the patch.
Thanks a lot!
Sorry to attach v2 if you already reviewed v1..
On 2024-01-11 12:13, jian he wrote:
On Tue, Jan 9, 2024 at 10:36 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada
<sawada.mshk@gmail.com>
wrote:If we want only such a feature we need to implement it together (the
patch could be split, though). But if some parts of the feature are
useful for users as well, I'd recommend implementing it incrementally.
That way, the patches can get small and it would be easy for reviewers
and committers to review/commit them.Jian, how do you think this comment?
Looking back at the discussion so far, it seems that not everyone
thinks
saving table information is the best idea[1] and some people think
just
skipping error data is useful.[2]Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.Wouldn't it be better to follow the following advice and develop the
functionality incrementally?On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
<sawada(dot)mshk(at)gmail(dot)com> wrote:So I'm thinking we may be able to implement this
feature incrementally. The first step would be something like an
option to ignore all errors or an option to specify the maximum number
of errors to tolerate before raising an ERROR. The second step would
be to support logging destinations such as server logs and tables.Attached a patch for this "first step" with reference to v7 patch,
which
logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only
supports
'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState
field.
* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext
*escontext.
So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, <command>COPY</command>
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
so we should set cstate->escontext->details_wanted to false.BTW I have question and comment about v15 patch:
+ { + /* + * + * InputFunctionCall is more faster than InputFunctionCallSafe. + * + */Have you measured this?
When I tested it in an older patch, there were no big difference[3].Thanks for pointing it out, I probably was over thinking.
- SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
SECURITY
SELECT+ SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH
SECOND_P
SECURITY SELECTThere was a comment that we shouldn't add new keyword for this[4].
Thanks for pointing it out.
Thanks for reviewing!
Updated the patch merging your suggestions except below points:
+ cstate->num_errors = 0;
Since cstate is already initialized in below lines, this may be
redundant.
| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));
+ Assert(!cstate->escontext->details_wanted);
I'm not sure this is necessary, considering we're going to add other
options like 'table' and 'log', which need details_wanted soon.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v2-0001-Add-new-COPY-option-SAVE_ERROR_TO.patchtext/x-diff; name=v2-0001-Add-new-COPY-option-SAVE_ERROR_TO.patchDownload
From a3f14a0e7e9a7b5fb961ad6b6b7b163cf6534a26 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 12 Jan 2024 11:32:00 +0900
Subject: [PATCH v2] Add new COPY option SAVE_ERROR_TO
Currently when source data contains unexpected data regarding data type or
range, entire COPY fails. However, in some cases such data can be ignored and
just copying normal data is preferable.
This patch adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
Currently SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, referenced with jian he's patch.
---
doc/src/sgml/ref/copy.sgml | 21 +++++++++++-
src/backend/commands/copy.c | 19 +++++++++++
src/backend/commands/copyfrom.c | 43 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 +++++++---
src/bin/psql/tab-complete.c | 7 +++-
src/include/commands/copy.h | 1 +
src/include/commands/copyfrom_internal.h | 3 ++
src/test/regress/expected/copy2.out | 28 +++++++++++++++
src/test/regress/sql/copy2.sql | 27 +++++++++++++++
9 files changed, 159 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c33..71941c4ee5 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies save error information to <replaceable class="parameter">
+ location</replaceable> when there are malformed data in the input.
+ If this option is specified, <command>COPY</command> skips malformed data
+ and continues copying data.
+ Currently only <literal>none</literal> is supported.
+ If this option is omitted, <command>COPY</command> stops operation at the
+ first error.
+ This option is allowed only in <command>COPY FROM</command>, and only when
+ not using <literal>binary</literal> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +574,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d7..5e5e8a5f34 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -571,6 +571,20 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ char *location = defGetString(defel);
+
+ if (opts_out->save_error_to)
+ errorConflictingDefElem(defel, pstate);
+ else if (strcmp(location, "none") == 0)
+ opts_out->save_error_to = location;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", location),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +612,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c..48484a2597 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to)
+ Assert(cstate->escontext);
+
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +996,26 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ if (cstate->opts.save_error_to && cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (strcmp(cstate->opts.save_error_to, "none") == 0)
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to be
+ * filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+ else
+ elog(ERROR, "unexpected SAVE_ERROR_TO location : %s",
+ cstate->opts.save_error_to);
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1281,6 +1305,11 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
}
+ if (cstate->opts.save_error_to && cstate->num_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;
@@ -1419,6 +1448,20 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->error_occurred = false;
+
+ /* Currently we only support "none". We'll add other options later */
+ if (strcmp(cstate->opts.save_error_to, "none") == 0)
+ cstate->escontext->details_wanted = false;
+ }
+ else
+ cstate->escontext = NULL;
+
/* Convert FORCE_NULL name list to per-column flags, check validity */
cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f645..7041815dee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e4..efe2b7cc10 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2f..f890b66f26 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -62,6 +62,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ char *save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a907..3744fac017 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,8 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions execution */
+ uint64 num_errors; /* total number of rows which contained soft errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07..4a1777a4fa 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -82,6 +82,8 @@ COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x to stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +96,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY save_error_to "unsupported" not recognized
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +716,26 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +750,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f6086..17c5764b42 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -70,12 +70,14 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x to stdin (format BINARY, save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +496,29 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +533,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
base-commit: 08c3ad27eb5348d0cbffa843a3edb11534f9904a
--
2.39.2
On Fri, Jan 12, 2024 at 10:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Thanks for reviewing!
Updated the patch merging your suggestions except below points:
+ cstate->num_errors = 0;
Since cstate is already initialized in below lines, this may be
redundant.| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));+ Assert(!cstate->escontext->details_wanted);
I'm not sure this is necessary, considering we're going to add other
options like 'table' and 'log', which need details_wanted soon.--
Regards,
make save_error_to option cannot be used with COPY TO.
add redundant test, save_error_to with COPY TO test.
Attachments:
v2-0001-minor-refactor.no-cfbotapplication/octet-stream; name=v2-0001-minor-refactor.no-cfbotDownload
From 957be936169ce66a8517974bc28fd02004a3c353 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sat, 13 Jan 2024 18:32:44 +0800
Subject: [PATCH v2 1/1] minor refactor
make save_error_to option cannot be used with COPY TO.
add redundant test, save_error_to with COPY TO test.
---
src/backend/commands/copy.c | 6 ++++++
src/test/regress/expected/copy2.out | 10 +++++++++-
src/test/regress/sql/copy2.sql | 4 +++-
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5e5e8a5f..f903ce32 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -584,6 +584,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY save_error_to \"%s\" not recognized", location),
parser_errposition(pstate, defel->location)));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ parser_errposition(pstate, defel->location)));
}
else
ereport(ERROR,
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 4a1777a4..97fea200 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,13 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
+COPY x from stdin (save_error_to none,save_error_to none);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
-COPY x to stdin (format BINARY, save_error_to none);
+COPY x from stdin (format BINARY, save_error_to none);
ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
+COPY x to stdin (save_error_to none);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (save_error_to none);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 17c5764b..fda46f86 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,11 +66,13 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+COPY x from stdin (save_error_to none,save_error_to none);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
-COPY x to stdin (format BINARY, save_error_to none);
+COPY x from stdin (format BINARY, save_error_to none);
+COPY x to stdin (save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
--
2.34.1
Hi!
I think this is a demanding and long-waited feature. The thread is
pretty long, but mostly it was disputes about how to save the errors.
The present patch includes basic infrastructure and ability to ignore
errors, thus it's pretty simple.
On Sat, Jan 13, 2024 at 4:20 PM jian he <jian.universality@gmail.com> wrote:
On Fri, Jan 12, 2024 at 10:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Thanks for reviewing!
Updated the patch merging your suggestions except below points:
+ cstate->num_errors = 0;
Since cstate is already initialized in below lines, this may be
redundant.| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));+ Assert(!cstate->escontext->details_wanted);
I'm not sure this is necessary, considering we're going to add other
options like 'table' and 'log', which need details_wanted soon.--
Regards,make save_error_to option cannot be used with COPY TO.
add redundant test, save_error_to with COPY TO test.
I've incorporated these changes. Also, I've changed
CopyFormatOptions.save_error_to to enum and made some edits in
comments and the commit message. I'm going to push this if there are
no objections.
------
Regards,
Alexander Korotkov
Attachments:
0001-Add-new-COPY-option-SAVE_ERROR_TO-v3.patchapplication/octet-stream; name=0001-Add-new-COPY-option-SAVE_ERROR_TO-v3.patchDownload
From c6033d4330e86bd33f90d36eb75b2c0427a54d82 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 14 Jan 2024 02:09:32 +0200
Subject: [PATCH] Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C
---
doc/src/sgml/ref/copy.sgml | 21 ++++++++++-
src/backend/commands/copy.c | 25 +++++++++++++
src/backend/commands/copyfrom.c | 46 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 ++++++---
src/bin/psql/tab-complete.c | 7 +++-
src/include/commands/copy.h | 11 ++++++
src/include/commands/copyfrom_internal.h | 5 +++
src/test/regress/expected/copy2.out | 36 +++++++++++++++++++
src/test/regress/sql/copy2.sql | 29 +++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 191 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e2ffbbdf84e..e15d5a621b8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies to save error information to <replaceable class="parameter">
+ location</replaceable> when there is malformed data in the input.
+ If this option is specified, <command>COPY</command> skips malformed data
+ and continues copying data.
+ Currently, only the <literal>none</literal> value is supported.
+ If this option is omitted, <command>COPY</command> stops operation at the
+ first error.
+ This option is allowed only in <command>COPY FROM</command>, and only when
+ not using <literal>binary</literal> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +574,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d77..8fc54e028a3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -571,6 +571,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ char *location = defGetString(defel);
+
+ if (opts_out->save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ errorConflictingDefElem(defel, pstate);
+ else if (strcmp(location, "none") == 0)
+ opts_out->save_error_to = COPY_SAVE_ERROR_TO_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", location),
+ parser_errposition(pstate, defel->location)));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +618,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c7..be6a151528e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ Assert(cstate->escontext);
+
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +996,25 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1281,6 +1304,12 @@ CopyFrom(CopyFromState cstate)
CopyMultiInsertInfoFlush(&multiInsertInfo, NULL, &processed);
}
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->num_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;
@@ -1419,6 +1448,23 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->error_occurred = false;
+
+ /*
+ * Currently we only support COPY_SAVE_ERROR_TO_NONE. We'll add other
+ * options later
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ cstate->escontext->details_wanted = false;
+ }
+ else
+ cstate->escontext = NULL;
+
/* Convert FORCE_NULL name list to per-column flags, check validity */
cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f6450..7207eb26983 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e42..efe2b7cc101 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2fc..7d1a6286a6f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,16 @@ typedef enum CopyHeaderChoice
COPY_HEADER_MATCH,
} CopyHeaderChoice;
+/*
+ * Represents where to save input processing errors. More values to be added
+ * in the future.
+ */
+typedef enum CopySaveErrorToChoice
+{
+ COPY_SAVE_ERROR_TO_UNSPECIFIED = 0, /* immediately throw errors */
+ COPY_SAVE_ERROR_TO_NONE, /* ignore errors */
+} CopySaveErrorToChoice;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -62,6 +72,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ CopySaveErrorToChoice save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a9071..cad52fcc783 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,10 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions
+ * execution */
+ uint64 num_errors; /* total number of rows which contained soft
+ * errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07c..97fea200310 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,11 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
+COPY x from stdin (save_error_to none,save_error_to none);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
+COPY x to stdin (save_error_to none);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (save_error_to none);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +104,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY save_error_to "unsupported" not recognized
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +724,26 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+WARNING: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +758,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60867..fda46f86c9e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,16 +66,20 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+COPY x from stdin (save_error_to none,save_error_to none);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x from stdin (format BINARY, save_error_to none);
+COPY x to stdin (save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +498,29 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +535,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..29fd1cae641 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4041,3 +4041,4 @@ manifest_writer
rfile
ws_options
ws_file_info
+CopySaveErrorToChoice
--
2.39.3 (Apple Git-145)
On Sun, Jan 14, 2024 at 10:30 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
Hi!
I think this is a demanding and long-waited feature. The thread is
pretty long, but mostly it was disputes about how to save the errors.
The present patch includes basic infrastructure and ability to ignore
errors, thus it's pretty simple.On Sat, Jan 13, 2024 at 4:20 PM jian he <jian.universality@gmail.com> wrote:
On Fri, Jan 12, 2024 at 10:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Thanks for reviewing!
Updated the patch merging your suggestions except below points:
+ cstate->num_errors = 0;
Since cstate is already initialized in below lines, this may be
redundant.| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));+ Assert(!cstate->escontext->details_wanted);
I'm not sure this is necessary, considering we're going to add other
options like 'table' and 'log', which need details_wanted soon.--
Regards,make save_error_to option cannot be used with COPY TO.
add redundant test, save_error_to with COPY TO test.I've incorporated these changes. Also, I've changed
CopyFormatOptions.save_error_to to enum and made some edits in
comments and the commit message. I'm going to push this if there are
no objections.
Thank you for updating the patch. Here are two comments:
---
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->num_errors > 0)
+ ereport(WARNING,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors));
+
/* Done, clean up */
error_context_stack = errcallback.previous;
If a malformed input is not the last data, the context message seems odd:
postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
a
1
2024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: ""
COPY 1
I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1]/messages/by-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A@mail.gmail.com uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2]/messages/by-id/3d0b349ddbd4ae5f605f77b491697158@oss.nttdata.com changes it to
WARNING without explanation.
---
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
We might want to cover the extra data cases too.
Regards,
[1]: /messages/by-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A@mail.gmail.com
[2]: /messages/by-id/3d0b349ddbd4ae5f605f77b491697158@oss.nttdata.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. Here are two comments:
--- + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(WARNING, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); + /* Done, clean up */ error_context_stack = errcallback.previous;If a malformed input is not the last data, the context message seems odd:
postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
12024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: ""
COPY 1I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.
Thank you for noticing this. I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did. And yes, it doesn't make sense to use the last line as
the context. Fixed.
--- +-- test missing data: should fail +COPY check_ign_err FROM STDIN WITH (save_error_to none); +1 {1} +\.We might want to cover the extra data cases too.
Agreed, the relevant test is added.
------
Regards,
Alexander Korotkov
Attachments:
0001-Add-new-COPY-option-SAVE_ERROR_TO-v4.patchapplication/octet-stream; name=0001-Add-new-COPY-option-SAVE_ERROR_TO-v4.patchDownload
From 26ac277594a0fd6853c9b09afa10bf56e9f2818b Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 14 Jan 2024 02:09:32 +0200
Subject: [PATCH] Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C
---
doc/src/sgml/ref/copy.sgml | 21 ++++++++++-
src/backend/commands/copy.c | 25 +++++++++++++
src/backend/commands/copyfrom.c | 46 ++++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 ++++++---
src/bin/psql/tab-complete.c | 7 +++-
src/include/commands/copy.h | 11 ++++++
src/include/commands/copyfrom_internal.h | 5 +++
src/test/regress/expected/copy2.out | 40 +++++++++++++++++++++
src/test/regress/sql/copy2.sql | 34 ++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 200 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e2ffbbdf84e..e15d5a621b8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,23 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies to save error information to <replaceable class="parameter">
+ location</replaceable> when there is malformed data in the input.
+ If this option is specified, <command>COPY</command> skips malformed data
+ and continues copying data.
+ Currently, only the <literal>none</literal> value is supported.
+ If this option is omitted, <command>COPY</command> stops operation at the
+ first error.
+ This option is allowed only in <command>COPY FROM</command>, and only when
+ not using <literal>binary</literal> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +574,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d77..8fc54e028a3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -571,6 +571,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ char *location = defGetString(defel);
+
+ if (opts_out->save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ errorConflictingDefElem(defel, pstate);
+ else if (strcmp(location, "none") == 0)
+ opts_out->save_error_to = COPY_SAVE_ERROR_TO_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", location),
+ parser_errposition(pstate, defel->location)));
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +618,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c7..d86c24e3140 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ Assert(cstate->escontext);
+
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +996,25 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1284,6 +1307,12 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg("%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors));
+
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1419,6 +1448,23 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->error_occurred = false;
+
+ /*
+ * Currently we only support COPY_SAVE_ERROR_TO_NONE. We'll add other
+ * options later
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ cstate->escontext->details_wanted = false;
+ }
+ else
+ cstate->escontext = NULL;
+
/* Convert FORCE_NULL name list to per-column flags, check validity */
cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f6450..7207eb26983 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e42..efe2b7cc101 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2fc..7d1a6286a6f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,16 @@ typedef enum CopyHeaderChoice
COPY_HEADER_MATCH,
} CopyHeaderChoice;
+/*
+ * Represents where to save input processing errors. More values to be added
+ * in the future.
+ */
+typedef enum CopySaveErrorToChoice
+{
+ COPY_SAVE_ERROR_TO_UNSPECIFIED = 0, /* immediately throw errors */
+ COPY_SAVE_ERROR_TO_NONE, /* ignore errors */
+} CopySaveErrorToChoice;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -62,6 +72,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ CopySaveErrorToChoice save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a9071..cad52fcc783 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,10 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions
+ * execution */
+ uint64 num_errors; /* total number of rows which contained soft
+ * errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07c..100fbf1dd1a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,11 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
+COPY x from stdin (save_error_to none,save_error_to none);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
+COPY x to stdin (save_error_to none);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (save_error_to none);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +104,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY save_error_to "unsupported" not recognized
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +724,30 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+NOTICE: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +762,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60867..f3c24647d4a 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,16 +66,20 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+COPY x from stdin (save_error_to none,save_error_to none);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x from stdin (format BINARY, save_error_to none);
+COPY x to stdin (save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +498,34 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 3 abc
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +540,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..29fd1cae641 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4041,3 +4041,4 @@ manifest_writer
rfile
ws_options
ws_file_info
+CopySaveErrorToChoice
--
2.39.3 (Apple Git-145)
On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. Here are two comments:
--- + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(WARNING, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); + /* Done, clean up */ error_context_stack = errcallback.previous;If a malformed input is not the last data, the context message seems odd:
postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
12024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: ""
COPY 1I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.Thank you for noticing this. I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did. And yes, it doesn't make sense to use the last line as
the context. Fixed.--- +-- test missing data: should fail +COPY check_ign_err FROM STDIN WITH (save_error_to none); +1 {1} +\.We might want to cover the extra data cases too.
Agreed, the relevant test is added.
Thank you for updating the patch. I have one minor point:
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
+ cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg("%zd rows were skipped due to
data type incompatibility",
+ cstate->num_errors));
+
We can use errmsg_plural() instead.
I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. Here are two comments:
--- + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(WARNING, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); + /* Done, clean up */ error_context_stack = errcallback.previous;If a malformed input is not the last data, the context message seems odd:
postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
12024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: ""
COPY 1I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.Thank you for noticing this. I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did. And yes, it doesn't make sense to use the last line as
the context. Fixed.--- +-- test missing data: should fail +COPY check_ign_err FROM STDIN WITH (save_error_to none); +1 {1} +\.We might want to cover the extra data cases too.
Agreed, the relevant test is added.
Thank you for updating the patch. I have one minor point:
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(NOTICE, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); +We can use errmsg_plural() instead.
Makes sense. Fixed.
I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.
Valid point. I've implemented the handling of CopySaveErrorToChoice
in a similar way to CopyHeaderChoice.
Please, check the revised patch attached.
------
Regards,
Alexander Korotkov
Attachments:
0001-Add-new-COPY-option-SAVE_ERROR_TO-v5.patchapplication/octet-stream; name=0001-Add-new-COPY-option-SAVE_ERROR_TO-v5.patchDownload
From 0e01ab7b1a59ca0a54ce03c482890216f43793d1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 14 Jan 2024 02:09:32 +0200
Subject: [PATCH] Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C
---
doc/src/sgml/ref/copy.sgml | 23 ++++++++++-
src/backend/commands/copy.c | 49 ++++++++++++++++++++++++
src/backend/commands/copyfrom.c | 48 +++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 +++++---
src/bin/psql/tab-complete.c | 7 +++-
src/include/commands/copy.h | 11 ++++++
src/include/commands/copyfrom_internal.h | 5 +++
src/test/regress/expected/copy2.out | 43 +++++++++++++++++++++
src/test/regress/sql/copy2.sql | 42 ++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 239 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e2ffbbdf84e..85881ca0ad6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies to save error information to <replaceable class="parameter">
+ location</replaceable> when there is malformed data in the input.
+ Currently, only <literal>error</literal> (default) and <literal>none</literal>
+ values are supported.
+ If the <literal>error</literal> value is specified,
+ <command>COPY</command> stops operation at the first error.
+ If the <literal>none</literal> value is specified,
+ <command>COPY</command> skips malformed data and continues copying data.
+ The option is allowed only in <command>COPY FROM</command>.
+ The <literal>none</literal> value is allowed only when
+ not using <literal>binary</literal> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +576,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d77..38c00379629 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -394,6 +394,42 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
return COPY_HEADER_FALSE; /* keep compiler quiet */
}
+/*
+ * Extract a defGetCopySaveErrorToChoice value from a DefElem.
+ */
+static CopySaveErrorToChoice
+defGetCopySaveErrorToChoice(DefElem *def, ParseState *pstate, bool is_from)
+{
+ char *sval;
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ parser_errposition(pstate, def->location)));
+
+ /*
+ * If no parameter value given, assume the default value.
+ */
+ if (def->arg == NULL)
+ return COPY_SAVE_ERROR_TO_ERROR;
+
+ /*
+ * Allow "error", or "none" values.
+ */
+ sval = defGetString(def);
+ if (pg_strcasecmp(sval, "error") == 0)
+ return COPY_SAVE_ERROR_TO_ERROR;
+ if (pg_strcasecmp(sval, "none") == 0)
+ return COPY_SAVE_ERROR_TO_NONE;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", sval),
+ parser_errposition(pstate, def->location)));
+ return COPY_SAVE_ERROR_TO_ERROR; /* keep compiler quiet */
+}
+
/*
* Process the statement option list for COPY.
*
@@ -419,6 +455,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_to_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -571,6 +608,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ if (save_error_to_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_to_specified = true;
+ opts_out->save_error_to = defGetCopySaveErrorToChoice(defel, pstate, is_from);
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +642,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c7..46b23e345b8 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ Assert(cstate->escontext);
+
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +996,25 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1284,6 +1307,14 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg_plural("%zd row were skipped due to data type incompatibility",
+ "%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1419,6 +1450,23 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->error_occurred = false;
+
+ /*
+ * Currently we only support COPY_SAVE_ERROR_TO_NONE. We'll add other
+ * options later
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ cstate->escontext->details_wanted = false;
+ }
+ else
+ cstate->escontext = NULL;
+
/* Convert FORCE_NULL name list to per-column flags, check validity */
cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f6450..7207eb26983 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e42..6bfdb5f0082 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("error", "none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2fc..8972c6180d7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,16 @@ typedef enum CopyHeaderChoice
COPY_HEADER_MATCH,
} CopyHeaderChoice;
+/*
+ * Represents where to save input processing errors. More values to be added
+ * in the future.
+ */
+typedef enum CopySaveErrorToChoice
+{
+ COPY_SAVE_ERROR_TO_ERROR = 0, /* immediately throw errors */
+ COPY_SAVE_ERROR_TO_NONE, /* ignore errors */
+} CopySaveErrorToChoice;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -62,6 +72,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ CopySaveErrorToChoice save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a9071..cad52fcc783 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,10 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions
+ * execution */
+ uint64 num_errors; /* total number of rows which contained soft
+ * errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07c..42cbcb2e92f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,11 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
+COPY x from stdin (save_error_to none,save_error_to none);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
+COPY x to stdin (save_error_to none);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (save_error_to none);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +104,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +724,33 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to error);
+ERROR: invalid input syntax for type integer: "a"
+CONTEXT: COPY check_ign_err, line 2, column n: "a"
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+NOTICE: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +765,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60867..c48d556350d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,16 +66,20 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+COPY x from stdin (save_error_to none,save_error_to none);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x from stdin (format BINARY, save_error_to none);
+COPY x to stdin (save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +498,42 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to error);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 3 abc
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +548,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..29fd1cae641 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4041,3 +4041,4 @@ manifest_writer
rfile
ws_options
ws_file_info
+CopySaveErrorToChoice
--
2.39.3 (Apple Git-145)
On 2024-01-16 00:17, Alexander Korotkov wrote:
On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. Here are two comments:
--- + if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(WARNING, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); + /* Done, clean up */ error_context_stack = errcallback.previous;If a malformed input is not the last data, the context message seems odd:
postgres(1:1769258)=# create table test (a int);
CREATE TABLE
postgres(1:1769258)=# copy test from stdin (save_error_to none);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.a
12024-01-15 05:05:53.980 JST [1769258] WARNING: 1 rows were skipped
due to data type incompatibility
2024-01-15 05:05:53.980 JST [1769258] CONTEXT: COPY test, line 3: ""
COPY 1I think it's better to report the WARNING after resetting the
error_context_stack. Or is a WARNING really appropriate here? The
v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
WARNING without explanation.Thank you for noticing this. I think NOTICE is more appropriate here.
There is nothing to "worry" about: the user asked to ignore the errors
and we did. And yes, it doesn't make sense to use the last line as
the context. Fixed.--- +-- test missing data: should fail +COPY check_ign_err FROM STDIN WITH (save_error_to none); +1 {1} +\.We might want to cover the extra data cases too.
Agreed, the relevant test is added.
Thank you for updating the patch. I have one minor point:
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED && + cstate->num_errors > 0) + ereport(NOTICE, + errmsg("%zd rows were skipped due to data type incompatibility", + cstate->num_errors)); +We can use errmsg_plural() instead.
Makes sense. Fixed.
I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.Valid point. I've implemented the handling of CopySaveErrorToChoice
in a similar way to CopyHeaderChoice.Please, check the revised patch attached.
Thanks for updating the patch!
Here is a minor comment:
+/* + * Extract a defGetCopySaveErrorToChoice value from a DefElem. + */
Should be Extract a "CopySaveErrorToChoice"?
BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.
------
Regards,
Alexander Korotkov
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Hi,
Thanks for updating the patch!
You're welcome!
Here is a minor comment:
+/* + * Extract a defGetCopySaveErrorToChoice value from a DefElem. + */Should be Extract a "CopySaveErrorToChoice"?
Fixed.
BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.
Please do!
------
Regards,
Alexander Korotkov
Attachments:
0001-Add-new-COPY-option-SAVE_ERROR_TO-v6.patchapplication/octet-stream; name=0001-Add-new-COPY-option-SAVE_ERROR_TO-v6.patchDownload
From d7d31a4ac6b5e4002995a9e2445bd425aa9e0fbd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 14 Jan 2024 02:09:32 +0200
Subject: [PATCH] Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
Later works are expected to add more choices, such as 'log' and 'table'.
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C
---
doc/src/sgml/ref/copy.sgml | 23 ++++++++++-
src/backend/commands/copy.c | 49 ++++++++++++++++++++++++
src/backend/commands/copyfrom.c | 48 +++++++++++++++++++++++
src/backend/commands/copyfromparse.c | 17 +++++---
src/bin/psql/tab-complete.c | 7 +++-
src/include/commands/copy.h | 11 ++++++
src/include/commands/copyfrom_internal.h | 5 +++
src/test/regress/expected/copy2.out | 43 +++++++++++++++++++++
src/test/regress/sql/copy2.sql | 42 ++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 239 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index e2ffbbdf84e..85881ca0ad6 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -373,6 +374,25 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>SAVE_ERROR_TO</literal></term>
+ <listitem>
+ <para>
+ Specifies to save error information to <replaceable class="parameter">
+ location</replaceable> when there is malformed data in the input.
+ Currently, only <literal>error</literal> (default) and <literal>none</literal>
+ values are supported.
+ If the <literal>error</literal> value is specified,
+ <command>COPY</command> stops operation at the first error.
+ If the <literal>none</literal> value is specified,
+ <command>COPY</command> skips malformed data and continues copying data.
+ The option is allowed only in <command>COPY FROM</command>.
+ The <literal>none</literal> value is allowed only when
+ not using <literal>binary</literal> format.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ENCODING</literal></term>
<listitem>
@@ -556,7 +576,8 @@ COPY <replaceable class="parameter">count</replaceable>
</para>
<para>
- <command>COPY</command> stops operation at the first error. This
+ <command>COPY</command> stops operation at the first error when
+ <literal>SAVE_ERROR_TO</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fe4cf957d77..c36d7f1daaf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -394,6 +394,42 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
return COPY_HEADER_FALSE; /* keep compiler quiet */
}
+/*
+ * Extract a CopySaveErrorToChoice value from a DefElem.
+ */
+static CopySaveErrorToChoice
+defGetCopySaveErrorToChoice(DefElem *def, ParseState *pstate, bool is_from)
+{
+ char *sval;
+
+ if (!is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ parser_errposition(pstate, def->location)));
+
+ /*
+ * If no parameter value given, assume the default value.
+ */
+ if (def->arg == NULL)
+ return COPY_SAVE_ERROR_TO_ERROR;
+
+ /*
+ * Allow "error", or "none" values.
+ */
+ sval = defGetString(def);
+ if (pg_strcasecmp(sval, "error") == 0)
+ return COPY_SAVE_ERROR_TO_ERROR;
+ if (pg_strcasecmp(sval, "none") == 0)
+ return COPY_SAVE_ERROR_TO_NONE;
+
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY save_error_to \"%s\" not recognized", sval),
+ parser_errposition(pstate, def->location)));
+ return COPY_SAVE_ERROR_TO_ERROR; /* keep compiler quiet */
+}
+
/*
* Process the statement option list for COPY.
*
@@ -419,6 +455,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool save_error_to_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -571,6 +608,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "save_error_to") == 0)
+ {
+ if (save_error_to_specified)
+ errorConflictingDefElem(defel, pstate);
+ save_error_to_specified = true;
+ opts_out->save_error_to = defGetCopySaveErrorToChoice(defel, pstate, is_from);
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -598,6 +642,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->binary && opts_out->save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37836a769c7..46b23e345b8 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -42,6 +42,7 @@
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
@@ -656,6 +657,9 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ Assert(cstate->escontext);
+
/*
* The target must be a plain, foreign, or partitioned relation, or have
* an INSTEAD OF INSERT row trigger. (Currently, such triggers are only
@@ -992,6 +996,25 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ cstate->escontext->error_occurred)
+ {
+ /*
+ * Soft error occured, skip this tuple and save error information
+ * according to SAVE_ERROR_TO.
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+
+ /*
+ * Just make ErrorSaveContext ready for the next NextCopyFrom.
+ * Since we don't set details_wanted and error_data is not to
+ * be filled, just resetting error_occurred is enough.
+ */
+ cstate->escontext->error_occurred = false;
+
+ continue;
+ }
+
ExecStoreVirtualTuple(myslot);
/*
@@ -1284,6 +1307,14 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ cstate->num_errors > 0)
+ ereport(NOTICE,
+ errmsg_plural("%zd row were skipped due to data type incompatibility",
+ "%zd rows were skipped due to data type incompatibility",
+ cstate->num_errors,
+ cstate->num_errors));
+
if (bistate != NULL)
FreeBulkInsertState(bistate);
@@ -1419,6 +1450,23 @@ BeginCopyFrom(ParseState *pstate,
}
}
+ /* Set up soft error handler for SAVE_ERROR_TO */
+ if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ {
+ cstate->escontext = makeNode(ErrorSaveContext);
+ cstate->escontext->type = T_ErrorSaveContext;
+ cstate->escontext->error_occurred = false;
+
+ /*
+ * Currently we only support COPY_SAVE_ERROR_TO_NONE. We'll add other
+ * options later
+ */
+ if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ cstate->escontext->details_wanted = false;
+ }
+ else
+ cstate->escontext = NULL;
+
/* Convert FORCE_NULL name list to per-column flags, check validity */
cstate->opts.force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
if (cstate->opts.force_null_all)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index af4c36f6450..7207eb26983 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -70,6 +70,7 @@
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/miscnodes.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
#include "utils/builtins.h"
@@ -955,11 +956,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- else
- values[m] = InputFunctionCall(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod);
+ /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 09914165e42..6bfdb5f0082 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2898,12 +2898,17 @@ psql_completion(const char *text, int start, int end)
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT");
+ "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "SAVE_ERROR_TO");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
+ COMPLETE_WITH("error", "none");
+
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
COMPLETE_WITH("WHERE");
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e6c1867a2fc..8972c6180d7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,16 @@ typedef enum CopyHeaderChoice
COPY_HEADER_MATCH,
} CopyHeaderChoice;
+/*
+ * Represents where to save input processing errors. More values to be added
+ * in the future.
+ */
+typedef enum CopySaveErrorToChoice
+{
+ COPY_SAVE_ERROR_TO_ERROR = 0, /* immediately throw errors */
+ COPY_SAVE_ERROR_TO_NONE, /* ignore errors */
+} CopySaveErrorToChoice;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -62,6 +72,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
+ CopySaveErrorToChoice save_error_to; /* where to save error information */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 715939a9071..cad52fcc783 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -16,6 +16,7 @@
#include "commands/copy.h"
#include "commands/trigger.h"
+#include "nodes/miscnodes.h"
/*
* Represents the different source cases we need to worry about at
@@ -94,6 +95,10 @@ typedef struct CopyFromStateData
* default value */
FmgrInfo *in_functions; /* array of input functions for each attrs */
Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions
+ * execution */
+ uint64 num_errors; /* total number of rows which contained soft
+ * errors */
int *defmap; /* array of default att numbers related to
* missing att */
ExprState **defexprs; /* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c4178b9c07c..42cbcb2e92f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,11 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
+COPY x from stdin (save_error_to none,save_error_to none);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
+COPY x from stdin (format BINARY, save_error_to none);
+ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
+COPY x to stdin (save_error_to none);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (save_error_to none);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -94,6 +104,10 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
+COPY x to stdin (format BINARY, save_error_to unsupported);
+ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
+LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+ ^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
ERROR: column "d" specified more than once
@@ -710,6 +724,33 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to error);
+ERROR: invalid input syntax for type integer: "a"
+CONTEXT: COPY check_ign_err, line 2, column n: "a"
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+NOTICE: 4 rows were skipped due to data type incompatibility
+SELECT * FROM check_ign_err;
+ n | m | k
+---+-----+---
+ 1 | {1} | 1
+ 5 | {5} | 5
+(2 rows)
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+ERROR: invalid input syntax for type widget: "1"
+CONTEXT: COPY hard_err, line 1, column foo: "1"
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1: "1 {1}"
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -724,6 +765,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index a5486f60867..c48d556350d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,16 +66,20 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
+COPY x from stdin (save_error_to none,save_error_to none);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
+COPY x from stdin (format BINARY, save_error_to none);
+COPY x to stdin (save_error_to none);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
+COPY x to stdin (format BINARY, save_error_to unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -494,6 +498,42 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- tests for SAVE_ERROR_TO option
+CREATE TABLE check_ign_err (n int, m int[], k int);
+COPY check_ign_err FROM STDIN WITH (save_error_to error);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 1
+a {2} 2
+3 {3} 3333333333
+4 {a, 4} 4
+
+5 {5} 5
+\.
+SELECT * FROM check_ign_err;
+
+-- test datatype error that can't be handled as soft: should fail
+CREATE TABLE hard_err(foo widget);
+COPY hard_err FROM STDIN WITH (save_error_to none);
+1
+\.
+
+-- test missing data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1}
+\.
+
+-- test extra data: should fail
+COPY check_ign_err FROM STDIN WITH (save_error_to none);
+1 {1} 3 abc
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
@@ -508,6 +548,8 @@ DROP TABLE instead_of_insert_tbl;
DROP VIEW instead_of_insert_tbl_view;
DROP VIEW instead_of_insert_tbl_view_2;
DROP FUNCTION fun_instead_of_insert_tbl();
+DROP TABLE check_ign_err;
+DROP TABLE hard_err;
--
-- COPY FROM ... DEFAULT
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..29fd1cae641 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4041,3 +4041,4 @@ manifest_writer
rfile
ws_options
ws_file_info
+CopySaveErrorToChoice
--
2.39.3 (Apple Git-145)
Hi,
Thanks for applying!
+ errmsg_plural("%zd row were skipped due
to data type incompatibility",
Sorry, I just noticed it, but 'were' should be 'was' here?
BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.Please do!
I've started it here:
/messages/by-id/d12fd8c99adcae2744212cb23feff6ed@oss.nttdata.com
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in
Hi,
Thanks for applying!
+ errmsg_plural("%zd row were skipped due to data type
incompatibility",Sorry, I just noticed it, but 'were' should be 'was' here?
BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.Please do!
I've started it here:
/messages/by-id/d12fd8c99adcae2744212cb23feff6ed@oss.nttdata.com
Switching topics, this commit (9e2d870119) adds the following help message:
"COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n"
" TO { '%s' | PROGRAM '%s' | STDOUT }\n"
...
" SAVE_ERROR_TO '%s'\n"
...
_("location"),
On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Jan 17, 2024 at 7:38 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Hi,
Thanks for applying!
+ errmsg_plural("%zd row were skipped due
to data type incompatibility",Sorry, I just noticed it, but 'were' should be 'was' here?
Sure, the fix is pushed.
------
Regards,
Alexander Korotkov
On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Wed, 17 Jan 2024 14:38:54 +0900, torikoshia <torikoshia@oss.nttdata.com> wrote in
Hi,
Thanks for applying!
+ errmsg_plural("%zd row were skipped due to data type
incompatibility",Sorry, I just noticed it, but 'were' should be 'was' here?
BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.Please do!
I've started it here:
/messages/by-id/d12fd8c99adcae2744212cb23feff6ed@oss.nttdata.com
Switching topics, this commit (9e2d870119) adds the following help message:
"COPY { %s [ ( %s [, ...] ) ] | ( %s ) }\n"
" TO { '%s' | PROGRAM '%s' | STDOUT }\n"
...
" SAVE_ERROR_TO '%s'\n"
...
_("location"),On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?
Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes
the next word will be location, not action. With some stretch we can
assume 'error' to be location. I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action. Probably, we can
replace SAVE_ERROR_TO with another name which could be naturally
followed by action, but I don't have something appropriate in mind.
However, I'm not native english speaker and certainly could miss
something.
------
Regards,
Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?
Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes
the next word will be location, not action. With some stretch we can
assume 'error' to be location. I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action.
The other problem with this terminology is that with 'none', what it
is doing is the exact opposite of "saving" the errors. I agree we
need a better name.
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.
regards, tom lane
On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes
the next word will be location, not action. With some stretch we can
assume 'error' to be location. I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action.The other problem with this terminology is that with 'none', what it
is doing is the exact opposite of "saving" the errors. I agree we
need a better name.
Agreed.
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.
I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes
the next word will be location, not action. With some stretch we can
assume 'error' to be location. I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action.The other problem with this terminology is that with 'none', what it
is doing is the exact opposite of "saving" the errors. I agree we
need a better name.Agreed.
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.
another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.
I agree, the parameter "error_action" is better than "location".
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
indicate "immediately error out" and 'just ignore the failure'
respectively, but these options hardly seem to denote a 'location',
and appear more like an 'action'. I somewhat suspect that this
parameter name intially conceived with the assupmtion that it would
take file names or similar parameters. I'm not sure if others will
agree, but I think the parameter name might not be the best
choice. For instance, considering the addition of the third value
'log', something like on_error_action (error, ignore, log) would be
more intuitively understandable. What do you think?Probably, but I'm not sure about that. The name SAVE_ERROR_TO assumes
the next word will be location, not action. With some stretch we can
assume 'error' to be location. I think it would be even more stretchy
to think that SAVE_ERROR_TO is followed by action.The other problem with this terminology is that with 'none', what it
is doing is the exact opposite of "saving" the errors. I agree we
need a better name.Agreed.
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.
OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".
------
Regards,
Alexander Korotkov
čt 18. 1. 2024 v 8:59 odesílatel Alexander Korotkov <aekorotkov@gmail.com>
napsal:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com
wrote:
On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of"error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some specialvalues
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along withlog-to-table?
Trying to distinguish a file name from a table name without any
other
context seems impossible.
I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".
+1
it is consistent with psql
Regards
Pavel
Show quoted text
------
Regards,
Alexander Korotkov
On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".
+1
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024-01-18 16:59, Alexander Korotkov wrote:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either
way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".
Thanks, also +1 from me.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Hi.
patch refactored based on "on_error {stop|ignore}"
doc changes:
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable
class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable
class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable
class="parameter">column_name</replaceable> [, ...] ) | * }
- SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
+ ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -375,20 +375,20 @@ COPY { <replaceable
class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
- <term><literal>SAVE_ERROR_TO</literal></term>
+ <term><literal>ON_ERROR</literal></term>
<listitem>
<para>
- Specifies to save error information to <replaceable class="parameter">
- location</replaceable> when there is malformed data in the input.
- Currently, only <literal>error</literal> (default) and
<literal>none</literal>
+ Specifies which <replaceable class="parameter">
+ error_action</replaceable> to perform when there is malformed
data in the input.
+ Currently, only <literal>stop</literal> (default) and
<literal>ignore</literal>
values are supported.
- If the <literal>error</literal> value is specified,
+ If the <literal>stop</literal> value is specified,
<command>COPY</command> stops operation at the first error.
- If the <literal>none</literal> value is specified,
+ If the <literal>ignore</literal> value is specified,
<command>COPY</command> skips malformed data and continues copying data.
The option is allowed only in <command>COPY FROM</command>.
- The <literal>none</literal> value is allowed only when
- not using <literal>binary</literal> format.
+ Only <literal>stop</literal> value is allowed only when
+ using <literal>binary</literal> format.
</para>
Attachments:
copy_on_error.difftext/x-patch; charset=US-ASCII; name=copy_on_error.diffDownload
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 85881ca0..c30baec1 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
- SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>'
+ ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
@@ -375,20 +375,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
- <term><literal>SAVE_ERROR_TO</literal></term>
+ <term><literal>ON_ERROR</literal></term>
<listitem>
<para>
- Specifies to save error information to <replaceable class="parameter">
- location</replaceable> when there is malformed data in the input.
- Currently, only <literal>error</literal> (default) and <literal>none</literal>
+ Specifies which <replaceable class="parameter">
+ error_action</replaceable> to perform when there is malformed data in the input.
+ Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
values are supported.
- If the <literal>error</literal> value is specified,
+ If the <literal>stop</literal> value is specified,
<command>COPY</command> stops operation at the first error.
- If the <literal>none</literal> value is specified,
+ If the <literal>ignore</literal> value is specified,
<command>COPY</command> skips malformed data and continues copying data.
The option is allowed only in <command>COPY FROM</command>.
- The <literal>none</literal> value is allowed only when
- not using <literal>binary</literal> format.
+ Only <literal>stop</literal> value is allowed only when
+ using <literal>binary</literal> format.
</para>
</listitem>
</varlistentry>
@@ -577,7 +577,7 @@ COPY <replaceable class="parameter">count</replaceable>
<para>
<command>COPY</command> stops operation at the first error when
- <literal>SAVE_ERROR_TO</literal> is not specified. This
+ <literal>ON_ERROR</literal> is not specified. This
should not lead to problems in the event of a <command>COPY
TO</command>, but the target table will already have received
earlier rows in a <command>COPY FROM</command>. These rows will not
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c36d7f1d..cc0786c6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -395,39 +395,39 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
}
/*
- * Extract a CopySaveErrorToChoice value from a DefElem.
+ * Extract a CopyOnErrorChoice value from a DefElem.
*/
-static CopySaveErrorToChoice
-defGetCopySaveErrorToChoice(DefElem *def, ParseState *pstate, bool is_from)
+static CopyOnErrorChoice
+defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{
char *sval;
if (!is_from)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY SAVE_ERROR_TO cannot be used with COPY TO"),
+ errmsg("COPY ON_ERROR cannot be used with COPY TO"),
parser_errposition(pstate, def->location)));
/*
* If no parameter value given, assume the default value.
*/
if (def->arg == NULL)
- return COPY_SAVE_ERROR_TO_ERROR;
+ return COPY_ON_ERROR_STOP;
/*
- * Allow "error", or "none" values.
+ * Allow "stop", or "ignore" values.
*/
sval = defGetString(def);
- if (pg_strcasecmp(sval, "error") == 0)
- return COPY_SAVE_ERROR_TO_ERROR;
- if (pg_strcasecmp(sval, "none") == 0)
- return COPY_SAVE_ERROR_TO_NONE;
+ if (pg_strcasecmp(sval, "stop") == 0)
+ return COPY_ON_ERROR_STOP;
+ if (pg_strcasecmp(sval, "ignore") == 0)
+ return COPY_ON_ERROR_IGNORE;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY save_error_to \"%s\" not recognized", sval),
+ errmsg("COPY ON_ERROR \"%s\" not recognized", sval),
parser_errposition(pstate, def->location)));
- return COPY_SAVE_ERROR_TO_ERROR; /* keep compiler quiet */
+ return COPY_ON_ERROR_STOP; /* keep compiler quiet */
}
/*
@@ -455,7 +455,7 @@ ProcessCopyOptions(ParseState *pstate,
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
- bool save_error_to_specified = false;
+ bool on_error_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -608,12 +608,12 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
- else if (strcmp(defel->defname, "save_error_to") == 0)
+ else if (strcmp(defel->defname, "on_error") == 0)
{
- if (save_error_to_specified)
+ if (on_error_specified)
errorConflictingDefElem(defel, pstate);
- save_error_to_specified = true;
- opts_out->save_error_to = defGetCopySaveErrorToChoice(defel, pstate, is_from);
+ on_error_specified = true;
+ opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from);
}
else
ereport(ERROR,
@@ -642,10 +642,10 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
- if (opts_out->binary && opts_out->save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify SAVE_ERROR_TO in BINARY mode")));
+ errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
/* Set defaults for omitted options */
if (!opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 50e245d5..c956cfa4 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,7 +657,7 @@ CopyFrom(CopyFromState cstate)
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
- if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
Assert(cstate->escontext);
/*
@@ -996,14 +996,14 @@ CopyFrom(CopyFromState cstate)
if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
break;
- if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
cstate->escontext->error_occurred)
{
/*
- * Soft error occured, skip this tuple and save error information
- * according to SAVE_ERROR_TO.
+ * Soft error occured, skip this tuple and deal with error information
+ * according to ON_ERROR.
*/
- if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
/*
* Just make ErrorSaveContext ready for the next NextCopyFrom.
@@ -1307,7 +1307,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR &&
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
cstate->num_errors > 0)
ereport(NOTICE,
errmsg_plural("%llu row was skipped due to data type incompatibility",
@@ -1450,18 +1450,18 @@ BeginCopyFrom(ParseState *pstate,
}
}
- /* Set up soft error handler for SAVE_ERROR_TO */
- if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_ERROR)
+ /* Set up soft error handler for ON_ERROR */
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
{
cstate->escontext = makeNode(ErrorSaveContext);
cstate->escontext->type = T_ErrorSaveContext;
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_SAVE_ERROR_TO_NONE. We'll add other
+ * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
* options later
*/
- if (cstate->opts.save_error_to == COPY_SAVE_ERROR_TO_NONE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7207eb26..36214aab 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -956,7 +956,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- /* If SAVE_ERROR_TO is specified, skip rows with soft errors */
+ /* If ON_ERROR is specified with IGNORE, skip rows with soft errors */
else if (!InputFunctionCallSafe(&in_functions[m],
string,
typioparams[m],
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6bfdb5f0..ada711d0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2899,15 +2899,15 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "SAVE_ERROR_TO");
+ "ON_ERROR");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
- /* Complete COPY <sth> FROM filename WITH (SAVE_ERROR_TO */
- else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "SAVE_ERROR_TO"))
- COMPLETE_WITH("error", "none");
+ /* Complete COPY <sth> FROM filename WITH (ON_ERROR */
+ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
+ COMPLETE_WITH("stop", "ignore");
/* Complete COPY <sth> FROM <sth> WITH (<options>) */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 8972c618..78af1b0e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -34,11 +34,11 @@ typedef enum CopyHeaderChoice
* Represents where to save input processing errors. More values to be added
* in the future.
*/
-typedef enum CopySaveErrorToChoice
+typedef enum CopyOnErrorChoice
{
- COPY_SAVE_ERROR_TO_ERROR = 0, /* immediately throw errors */
- COPY_SAVE_ERROR_TO_NONE, /* ignore errors */
-} CopySaveErrorToChoice;
+ COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
+ COPY_ON_ERROR_IGNORE, /* ignore errors */
+} CopyOnErrorChoice;
/*
* A struct to hold COPY options, in a parsed form. All of these are related
@@ -72,7 +72,7 @@ typedef struct CopyFormatOptions
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
bool convert_selectively; /* do selective binary conversion? */
- CopySaveErrorToChoice save_error_to; /* where to save error information */
+ CopyOnErrorChoice on_error; /* what to do when error happened */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 42cbcb2e..d982ae4f 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -77,21 +77,21 @@ COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii...
^
-COPY x from stdin (save_error_to none,save_error_to none);
+COPY x from stdin (ON_ERROR ignore, ON_ERROR ignore);
ERROR: conflicting or redundant options
-LINE 1: COPY x from stdin (save_error_to none,save_error_to none);
- ^
+LINE 1: COPY x from stdin (ON_ERROR ignore, ON_ERROR ignore);
+ ^
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
ERROR: cannot specify DELIMITER in BINARY mode
COPY x to stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
-COPY x from stdin (format BINARY, save_error_to none);
-ERROR: cannot specify SAVE_ERROR_TO in BINARY mode
-COPY x to stdin (save_error_to none);
-ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
-LINE 1: COPY x to stdin (save_error_to none);
- ^
+COPY x from stdin (format BINARY, ON_ERROR ignore);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (ON_ERROR unsupported);
+ERROR: COPY ON_ERROR "unsupported" not recognized
+LINE 1: COPY x from stdin (ON_ERROR unsupported);
+ ^
COPY x to stdin (format TEXT, force_quote(a));
ERROR: COPY FORCE_QUOTE requires CSV mode
COPY x from stdin (format CSV, force_quote(a));
@@ -104,9 +104,9 @@ COPY x to stdout (format TEXT, force_null(a));
ERROR: COPY FORCE_NULL requires CSV mode
COPY x to stdin (format CSV, force_null(a));
ERROR: COPY FORCE_NULL cannot be used with COPY TO
-COPY x to stdin (format BINARY, save_error_to unsupported);
-ERROR: COPY SAVE_ERROR_TO cannot be used with COPY TO
-LINE 1: COPY x to stdin (format BINARY, save_error_to unsupported);
+COPY x to stdin (format BINARY, ON_ERROR unsupported);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (format BINARY, ON_ERROR unsupported);
^
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -724,12 +724,12 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
--- tests for SAVE_ERROR_TO option
+-- tests for ON_ERROR option
CREATE TABLE check_ign_err (n int, m int[], k int);
-COPY check_ign_err FROM STDIN WITH (save_error_to error);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR stop);
ERROR: invalid input syntax for type integer: "a"
CONTEXT: COPY check_ign_err, line 2, column n: "a"
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
NOTICE: 4 rows were skipped due to data type incompatibility
SELECT * FROM check_ign_err;
n | m | k
@@ -740,15 +740,15 @@ SELECT * FROM check_ign_err;
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
-COPY hard_err FROM STDIN WITH (save_error_to none);
+COPY hard_err FROM STDIN WITH (ON_ERROR ignore);
ERROR: invalid input syntax for type widget: "1"
CONTEXT: COPY hard_err, line 1, column foo: "1"
-- test missing data: should fail
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
ERROR: missing data for column "k"
CONTEXT: COPY check_ign_err, line 1: "1 {1}"
-- test extra data: should fail
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
ERROR: extra data after last expected column
CONTEXT: COPY check_ign_err, line 1: "1 {1} 3 abc"
-- clean up
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index c48d5563..73b2e688 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -66,20 +66,20 @@ COPY x from stdin (force_not_null (a), force_not_null (b));
COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
-COPY x from stdin (save_error_to none,save_error_to none);
+COPY x from stdin (ON_ERROR ignore, ON_ERROR ignore);
-- incorrect options
COPY x to stdin (format BINARY, delimiter ',');
COPY x to stdin (format BINARY, null 'x');
-COPY x from stdin (format BINARY, save_error_to none);
-COPY x to stdin (save_error_to none);
+COPY x from stdin (format BINARY, ON_ERROR ignore);
+COPY x from stdin (ON_ERROR unsupported);
COPY x to stdin (format TEXT, force_quote(a));
COPY x from stdin (format CSV, force_quote(a));
COPY x to stdout (format TEXT, force_not_null(a));
COPY x to stdin (format CSV, force_not_null(a));
COPY x to stdout (format TEXT, force_null(a));
COPY x to stdin (format CSV, force_null(a));
-COPY x to stdin (format BINARY, save_error_to unsupported);
+COPY x to stdin (format BINARY, ON_ERROR unsupported);
-- too many columns in column list: should fail
COPY x (a, b, c, d, e, d, c) from stdin;
@@ -498,9 +498,9 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
--- tests for SAVE_ERROR_TO option
+-- tests for ON_ERROR option
CREATE TABLE check_ign_err (n int, m int[], k int);
-COPY check_ign_err FROM STDIN WITH (save_error_to error);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR stop);
1 {1} 1
a {2} 2
3 {3} 3333333333
@@ -508,7 +508,7 @@ a {2} 2
5 {5} 5
\.
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
1 {1} 1
a {2} 2
3 {3} 3333333333
@@ -520,17 +520,17 @@ SELECT * FROM check_ign_err;
-- test datatype error that can't be handled as soft: should fail
CREATE TABLE hard_err(foo widget);
-COPY hard_err FROM STDIN WITH (save_error_to none);
+COPY hard_err FROM STDIN WITH (ON_ERROR ignore);
1
\.
-- test missing data: should fail
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
1 {1}
\.
-- test extra data: should fail
-COPY check_ign_err FROM STDIN WITH (save_error_to none);
+COPY check_ign_err FROM STDIN WITH (ON_ERROR ignore);
1 {1} 3 abc
\.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 29fd1cae..456461f8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -478,6 +478,7 @@ CopyHeaderChoice
CopyInsertMethod
CopyMultiInsertBuffer
CopyMultiInsertInfo
+CopyOnErrorChoice
CopySource
CopyStmt
CopyToState
@@ -4041,4 +4042,3 @@ manifest_writer
rfile
ws_options
ws_file_info
-CopySaveErrorToChoice
On 2024-01-18 23:59, jian he wrote:
Hi.
patch refactored based on "on_error {stop|ignore}"
doc changes:--- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -43,7 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * } FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * } FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * } - SAVE_ERROR_TO '<replaceable class="parameter">location</replaceable>' + ON_ERROR '<replaceable class="parameter">error_action</replaceable>' ENCODING '<replaceable class="parameter">encoding_name</replaceable>' </synopsis> </refsynopsisdiv> @@ -375,20 +375,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </varlistentry><varlistentry> - <term><literal>SAVE_ERROR_TO</literal></term> + <term><literal>ON_ERROR</literal></term> <listitem> <para> - Specifies to save error information to <replaceable class="parameter"> - location</replaceable> when there is malformed data in the input. - Currently, only <literal>error</literal> (default) and <literal>none</literal> + Specifies which <replaceable class="parameter"> + error_action</replaceable> to perform when there is malformed data in the input. + Currently, only <literal>stop</literal> (default) and <literal>ignore</literal> values are supported. - If the <literal>error</literal> value is specified, + If the <literal>stop</literal> value is specified, <command>COPY</command> stops operation at the first error. - If the <literal>none</literal> value is specified, + If the <literal>ignore</literal> value is specified, <command>COPY</command> skips malformed data and continues copying data. The option is allowed only in <command>COPY FROM</command>. - The <literal>none</literal> value is allowed only when - not using <literal>binary</literal> format. + Only <literal>stop</literal> value is allowed only when + using <literal>binary</literal> format. </para>
Thanks for making the patch!
Here are some comments:
- The <literal>none</literal> value is allowed only when - not using <literal>binary</literal> format. + Only <literal>stop</literal> value is allowed only when + using <literal>binary</literal> format.
The second 'only' may be unnecessary.
- /* If SAVE_ERROR_TO is specified, skip rows with soft errors */ + /* If ON_ERROR is specified with IGNORE, skip rows with soft errors */
This is correct now, but considering future works which add other
options like "file 'copy.log'" and
"table 'copy_log'", it may be better not to limit the case to 'IGNORE'.
How about something like this?
If ON_ERROR is specified and the value is not STOP, skip rows with
soft errors
-COPY x from stdin (format BINARY, save_error_to none); -COPY x to stdin (save_error_to none); +COPY x from stdin (format BINARY, ON_ERROR ignore); +COPY x from stdin (ON_ERROR unsupported); COPY x to stdin (format TEXT, force_quote(a)); COPY x from stdin (format CSV, force_quote(a));
In the existing test for copy2.sql, the COPY options are written in
lower case(e.g. 'format') and option value(e.g. 'BINARY') are written in
upper case.
It would be more consistent to align them.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Hi!
On Fri, Jan 19, 2024 at 2:37 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Thanks for making the patch!
The patch is pushed! The proposed changes are incorporated excluding this.
- /* If SAVE_ERROR_TO is specified, skip rows with soft errors */ + /* If ON_ERROR is specified with IGNORE, skip rows with soft errors */This is correct now, but considering future works which add other
options like "file 'copy.log'" and
"table 'copy_log'", it may be better not to limit the case to 'IGNORE'.
How about something like this?If ON_ERROR is specified and the value is not STOP, skip rows with
soft errors
I think when we have more options, then we wouldn't just skip rows
with soft errors but rather save them. So, I left this comment as is
for now.
------
Regards,
Alexander Korotkov
On 2024-01-19 22:27, Alexander Korotkov wrote:
Hi!
On Fri, Jan 19, 2024 at 2:37 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:Thanks for making the patch!
The patch is pushed! The proposed changes are incorporated excluding
this.- /* If SAVE_ERROR_TO is specified, skip rows with soft errors */ + /* If ON_ERROR is specified with IGNORE, skip rows with soft errors */This is correct now, but considering future works which add other
options like "file 'copy.log'" and
"table 'copy_log'", it may be better not to limit the case to
'IGNORE'.
How about something like this?If ON_ERROR is specified and the value is not STOP, skip rows with
soft errorsI think when we have more options, then we wouldn't just skip rows
with soft errors but rather save them. So, I left this comment as is
for now.
Agreed.
Thanks for the notification!
------
Regards,
Alexander Korotkov
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
0001-doc-Fix-COPY-ON_ERROR-option-syntax-synopsis.patchapplication/octet-stream; name=0001-doc-Fix-COPY-ON_ERROR-option-syntax-synopsis.patchDownload
From 9a5acbff6cf1dbebc04ae221a292161d6b6cdeb0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 28 Mar 2024 10:11:48 +0900
Subject: [PATCH] doc: Fix COPY ON_ERROR option syntax synopsis.
Oversight in b725b7eec43.
Reviewed-by:
Discussion: https://postgr.es/m/
---
doc/src/sgml/ref/copy.sgml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 6c83e30ed0..557e344004 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
- ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
+ ON_ERROR [ <replaceable class="parameter">error_action</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
--
2.39.3
On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.
When seeing the query which abbreviates ON_ERROR value, I feel it's not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.
COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.
I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v1-0001-Disallow-ON_ERROR-option-without-value.patchtext/x-diff; name=v1-0001-Disallow-ON_ERROR-option-without-value.patchDownload
From 1b4bec3c2223246ec59ffb9eb7de2f1de27315f7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 29 Mar 2024 11:36:12 +0900
Subject: [PATCH v1] Disallow ON_ERROR option without value
Currently ON_ERROR option of COPY allows to omit its value,
but the syntax synopsis in the documentation requires it.
Since it seems non-boolean parameters usually require its value
and it's not obvious what happens when value of ON_ERROR is
omitted, this patch disallows ON_ERROR without its value.
---
src/backend/commands/copy.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28cf8b040a..2719bf28b7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -392,7 +392,7 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
static CopyOnErrorChoice
defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{
- char *sval;
+ char *sval = defGetString(def);
if (!is_from)
ereport(ERROR,
@@ -400,16 +400,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
errmsg("COPY ON_ERROR cannot be used with COPY TO"),
parser_errposition(pstate, def->location)));
- /*
- * If no parameter value given, assume the default value.
- */
- if (def->arg == NULL)
- return COPY_ON_ERROR_STOP;
-
/*
* Allow "stop", or "ignore" values.
*/
- sval = defGetString(def);
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
base-commit: 0075d78947e3800c5a807f48fd901f16db91101b
--
2.39.2
On Fri, Mar 29, 2024 at 11:54 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.
Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?
The rest looks good to me.
Alexander, what do you think about this change as you're the committer
of this feature?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024-04-01 11:31, Masahiko Sawada wrote:
On Fri, Mar 29, 2024 at 11:54 AM torikoshia
<torikoshia@oss.nttdata.com> wrote:On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?
Agreed.
Since it seems this issue is independent from the omission of ON_ERROR
option value, attached a separate patch.
The rest looks good to me.
Alexander, what do you think about this change as you're the committer
of this feature?
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachments:
v1-0001-Disallow-ON_ERROR-option-without-value.patchtext/x-diff; name=v1-0001-Disallow-ON_ERROR-option-without-value.patchDownload
From 1b4bec3c2223246ec59ffb9eb7de2f1de27315f7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 29 Mar 2024 11:36:12 +0900
Subject: [PATCH v1] Disallow ON_ERROR option without value
Currently ON_ERROR option of COPY allows to omit its value,
but the syntax synopsis in the documentation requires it.
Since it seems non-boolean parameters usually require its value
and it's not obvious what happens when value of ON_ERROR is
omitted, this patch disallows ON_ERROR without its value.
---
src/backend/commands/copy.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28cf8b040a..2719bf28b7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -392,7 +392,7 @@ defGetCopyHeaderChoice(DefElem *def, bool is_from)
static CopyOnErrorChoice
defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{
- char *sval;
+ char *sval = defGetString(def);
if (!is_from)
ereport(ERROR,
@@ -400,16 +400,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
errmsg("COPY ON_ERROR cannot be used with COPY TO"),
parser_errposition(pstate, def->location)));
- /*
- * If no parameter value given, assume the default value.
- */
- if (def->arg == NULL)
- return COPY_ON_ERROR_STOP;
-
/*
* Allow "stop", or "ignore" values.
*/
- sval = defGetString(def);
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
base-commit: 0075d78947e3800c5a807f48fd901f16db91101b
--
2.39.2
v1-0001-doc-Fix-COPY-ON_ERROR-option-syntax-synopsis.patchtext/x-diff; name=v1-0001-doc-Fix-COPY-ON_ERROR-option-syntax-synopsis.patchDownload
From 840152c20d47220f106d5fe14af4a86cec99987e Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 2 Apr 2024 19:11:01 +0900
Subject: [PATCH v1] doc: Fix COPY ON_ERROR option syntax synopsis.
Since ON_ERROR value doesn't require quotations, this patch removes them.
Oversight in b725b7eec43.
---
doc/src/sgml/ref/copy.sgml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 33ce7c4ea6..1ce19668d8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
- ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
+ ON_ERROR <replaceable class="parameter">error_action</replaceable>
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
LOG_VERBOSITY <replaceable class="parameter">mode</replaceable>
</synopsis>
base-commit: 0075d78947e3800c5a807f48fd901f16db91101b
--
2.39.2
On Tue, Apr 2, 2024 at 7:34 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-04-01 11:31, Masahiko Sawada wrote:
On Fri, Mar 29, 2024 at 11:54 AM torikoshia
<torikoshia@oss.nttdata.com> wrote:On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?Agreed.
Since it seems this issue is independent from the omission of ON_ERROR
option value, attached a separate patch.
Thank you for the patches! These patches look good to me. I'll push
them, barring any objections.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2024-04-16 13:16, Masahiko Sawada wrote:
On Tue, Apr 2, 2024 at 7:34 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-04-01 11:31, Masahiko Sawada wrote:
On Fri, Mar 29, 2024 at 11:54 AM torikoshia
<torikoshia@oss.nttdata.com> wrote:On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?Agreed.
Since it seems this issue is independent from the omission of ON_ERROR
option value, attached a separate patch.Thank you for the patches! These patches look good to me. I'll push
them, barring any objections.Regards,
Thanks for your review and apply!
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
On Wed, Apr 17, 2024 at 4:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-04-16 13:16, Masahiko Sawada wrote:
On Tue, Apr 2, 2024 at 7:34 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-04-01 11:31, Masahiko Sawada wrote:
On Fri, Mar 29, 2024 at 11:54 AM torikoshia
<torikoshia@oss.nttdata.com> wrote:On 2024-03-28 21:54, Masahiko Sawada wrote:
On Thu, Mar 28, 2024 at 9:38 PM torikoshia <torikoshia@oss.nttdata.com>
wrote:On 2024-03-28 10:20, Masahiko Sawada wrote:
Hi,
On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Thu, Jan 18, 2024 at 4:16 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Jan 18, 2024 at 6:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyotaro-san's suggestion isn't bad, though I might shorten it to
error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
You will need a separate parameter anyway to specify the destination
of "log", unless "none" became an illegal table name when I wasn't
looking. I don't buy that one parameter that has some special values
while other values could be names will be a good design. Moreover,
what if we want to support (say) log-to-file along with log-to-table?
Trying to distinguish a file name from a table name without any other
context seems impossible.I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.I agree, the parameter "error_action" is better than "location".
I'm not sure whether error_action or on_error is better, but either way
"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.OK. What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".+1
I realized that ON_ERROR syntax synoposis in the documentation is not
correct. The option doesn't require the value to be quoted and the
value can be omitted. The attached patch fixes it.Regards,
Thanks!
Attached patch fixes the doc, but I'm wondering perhaps it might be
better to modify the codes to prohibit abbreviation of the value.When seeing the query which abbreviates ON_ERROR value, I feel it's
not
obvious what happens compared to other options which tolerates
abbreviation of the value such as FREEZE or HEADER.COPY t1 FROM stdin WITH (ON_ERROR);
What do you think?
Indeed. Looking at options of other commands such as VACUUM and
EXPLAIN, I can see that we can omit a boolean value, but non-boolean
parameters require its value. The HEADER option is not a pure boolean
parameter but we can omit the value. It seems to be for backward
compatibility; it used to be a boolean parameter. I agree that the
above example would confuse users.Regards,
Thanks for your comment!
Attached a patch which modifies the code to prohibit omission of its
value.I was a little unsure about adding a regression test for this, but I
have not added it since other COPY option doesn't test the omission of
its value.Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?Agreed.
Since it seems this issue is independent from the omission of ON_ERROR
option value, attached a separate patch.Thank you for the patches! These patches look good to me. I'll push
them, barring any objections.Regards,
Thanks for your review and apply!
Thank you for the patches!
Pushed: a6d0fa5ef8 and f6f8ac8e75.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com