Bulk Load Ignore/Skip Feature
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.
Are there alternative ways of dealing with this scenario that won't
consume as much time?
Appreciate any help - would love to migrate away from Oracle.
Cheers
Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.Are there alternative ways of dealing with this scenario that won't
consume as much time?Appreciate any help - would love to migrate away from Oracle.
Cheers
pgloader
http://pgfoundry.org/projects/pgloader/
--
Reg me Please
Perfect - that appears to be exactly what I was looking for.
Cheers
Reg Me Please wrote:
Show quoted text
Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.Are there alternative ways of dealing with this scenario that won't
consume as much time?Appreciate any help - would love to migrate away from Oracle.
Cheers
pgloader
On Wed, 2007-11-14 at 00:02 -0800, Willem Buitendyk wrote:
Perfect - that appears to be exactly what I was looking for.
Reg Me Please wrote:
Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.pgloader
I believe the last time I tried this, there was still some issues with
it. See attached email. (if it makes it to the list)
On Tue, Nov 13, 2007 at 08:50:36PM -0800, Willem Buitendyk wrote:
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to
halt execution instead of skipping over the violation and logging it
- as is done in Oracle and DB2.Are there alternative ways of dealing with this scenario that won't
consume as much time?Appreciate any help - would love to migrate away from Oracle.
You might try pgloader. :)
http://pgfoundry.org/projects/pgloader/
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Damn - so the unqiue contraint is still an issue. What gives? Why is
it so hard to implement this in Postgresql? sigh - if only I had more time.
Ow Mun Heng wrote:
Show quoted text
On Wed, 2007-11-14 at 00:02 -0800, Willem Buitendyk wrote:
Perfect - that appears to be exactly what I was looking for.
Reg Me Please wrote:
Il Wednesday 14 November 2007 05:50:36 Willem Buitendyk ha scritto:
Will Postgresql ever implement an ignore on error feature when bulk
loading data? Currently it is my understanding that any record that
violates a unique constraint will cause the "copy from" command to halt
execution instead of skipping over the violation and logging it - as is
done in Oracle and DB2.pgloader
I believe the last time I tried this, there was still some issues with
it. See attached email. (if it makes it to the list)__________ NOD32 2657 (20071114) Information __________
This message was checked by NOD32 antivirus system.
http://www.eset.com------------------------------------------------------------------------
Subject:
PgLoader unable to handle pkey dups Was [Re: {Spam} [GENERAL] pgloader
- Can't find textreader/csvreader]
From:
Ow Mun Heng <Ow.Mun.Heng@wdc.com>
Date:
Mon, 27 Aug 2007 18:01:54 +0800
To:
Dimitri Fontaine <dfontaine@hi-media.com>To:
Dimitri Fontaine <dfontaine@hi-media.com>
CC:
pgsql-general@postgresql.orgOn Mon, 2007-08-27 at 11:27 +0200, Dimitri Fontaine wrote:
We've just made some tests here with 2.2.1 and as this release contains the
missing files, it works fine without any installation.Yep.. I can confirm that it works.. I am using the csv example.
Goal : similar functionality much like mysql's mysqlimport --replace
(overwrite any rows which has duplicate primary keys)$ psql pgloader < csv/csv.sql
$ ../pgloader.py -Tvc examples/pgloader.conf csvpgloader=# alter table csv add primary key (a,b,c);
pgloader=# \d csv
Table "public.csv"
Column | Type | Modifiers
--------+--------------+-----------
a | bigint | not null
b | bigint | not null
c | character(2) | not null
d | text |
Indexes:
"csv_pkey" PRIMARY KEY, btree (a, b, c)pgloader=# select * from csv;
a | b | c | d
----------+----------+----+----------------
33996344 | 33996351 | GB | United Kingdom
50331648 | 68257567 | US | United States
68257568 | 68257599 | CA | Canada
68257600 | 68259583 | US | United States
68259584 | 68259599 | CA | Canada$cat csv/csv.data
"2.6.190.56","2.6.190.63","33996344","33996351","GB","Error Kingdom"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","new Country"
<Note : only columns 3 to 6 are taken for loading)$ psql pgloader < csv/csv.sql
$ ../pgloader.py -vc pgloader.conf csv
Using pgloader.conf configuration file
Will consider following sections:
csv[csv] parse configuration
Notice: reject log in /tmp/csv.rej.log
Notice: rejected data in /tmp/csv.rej
[csv] data import
Notice: COPY csv dataError: Please check PostgreSQL logs
HINT: double check your client_encoding, datestyle and copy_delimiter
settings$sudo tail -f /var/log/pglog/postgresxx-xx-xx.log
ERROR: duplicate key violates unique constraint "csv_pkey"
CONTEXT: COPY csv, line 1: "33996344,33996351,Error Kingdom,GB"
STATEMENT: COPY csv (a, b, d, c) FROM stdin USING DELIMITERS ','So.. doesn't really solve my issue.
Dang it..---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
Damn - so the unqiue contraint is still an issue. What gives? Why is
it so hard to implement this in Postgresql? sigh - if only I had more time.
Can you explain? The server ofcourse still generates error messages in
the logs, there's no way around that. However it looks to me that the
data ended up in the database correctly? Or did I miss something?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy
My apologies. I misinterpreted that last post. I have not been able to
try pgloader as I am using the windows platform.
Martijn van Oosterhout wrote:
Show quoted text
On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
Damn - so the unqiue contraint is still an issue. What gives? Why is
it so hard to implement this in Postgresql? sigh - if only I had more time.Can you explain? The server ofcourse still generates error messages in
the logs, there's no way around that. However it looks to me that the
data ended up in the database correctly? Or did I miss something?Have a nice day,
Hi all,
Le Friday 16 November 2007 18:04:44 Willem Buitendyk, vous avez écrit :
Martijn van Oosterhout wrote:
On Thu, Nov 15, 2007 at 08:09:46PM -0800, Willem Buitendyk wrote:
Damn - so the unqiue contraint is still an issue. What gives? Why is
it so hard to implement this in Postgresql? sigh - if only I had more
time.Can you explain? The server ofcourse still generates error messages in
the logs, there's no way around that. However it looks to me that the
data ended up in the database correctly? Or did I miss something?
pgloader will load non-conflicting data and produce both a reject log with
errors about non inserted (COPYied) data and a reject data file with the
input line ready to be processed again if such is the operator choice.
But at the moment it does not provide any way to automate the UPDATE the PK
conflicting rows. I'm really hesitant as to code this option: what to do in
the case of a non primary key unique constraint conflict:
dim=# create table unic(a integer unique);
dim=# insert into unic values (1);
INSERT 2312559 1
dim=# insert into unic values (1);
ERROR: duplicate key violates unique constraint "unic_a_key"
dim=# create table pk(a integer primary key);
dim=# insert into pk values (1);
INSERT 2312565 1
dim=# insert into pk values (1);
ERROR: duplicate key violates unique constraint "pk_pkey"
I'm thinking maybe in the first case you don't want existing values to be
overwritten, but in the second case it's what you want to happen. Should this
be the user responsibility to make the difference --- by configuring pgloader
properly --- or should the tool try hard to protect the user against himself?
How to act on a table with a surrogate pk and a unique constraint when you
want to automatically update surrogate key but not the unique data, or the
other way around?
So I have two questions for the community:
- should I provide a pgloader mailing list?
- what do you think about adding the UPDATE-on-duplicate-key-error option?
My apologies. I misinterpreted that last post. I have not been able to
try pgloader as I am using the windows platform.
pgloader is a python "script" which depends on psycopg for handling the
PostgreSQL connection, and only standard python modules after that. The
following link provides windows binaries for psycopg.
http://www.stickpeople.com/projects/python/win-psycopg/
I've gotten reports of pgloader running on windows, even if I didn't make any
specific effort for this to happen and I don't have any proprietary licenced
OS to test pgloader on.
Hope this helps,
--
dim