error tolerant inserting of data

Started by Jules Albertsabout 23 years ago3 messagesgeneral
Jump to latest
#1Jules Alberts
jules.alberts@arbodienst-limburg.nl

Hello everybody,

I'm working on a new pg database that will contain all data that's
currently in our old system. The data is already in the db, in tables
with names like import_table1, import_table2 etc., now I have to
transfer them to the real tables. The real tables have some
constraints, like trigger functions to prevent overlapping date-
periods.

I have written pl/pgsql code to do the importing bit, but there is one
big drawback: as soon as anything in a function (or in a FOR ... SELECT
... LOOP) goes wrong, everything is rolled back. Normally this
behaviour would be fine, but not in this situation, because my old data
contains rows sometimes conflicts with the constraints.

What I want is all conflicting rows to be not imported, just plainly be
refused, and let everything that's OK be imported. What is the best way
to do this?

TIA!

#2Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Jules Alberts (#1)
Re: error tolerant inserting of data

On Wed, 5 Mar 2003, Jules Alberts wrote:

Hello everybody,

I'm working on a new pg database that will contain all data that's
currently in our old system. The data is already in the db, in tables
with names like import_table1, import_table2 etc., now I have to
transfer them to the real tables. The real tables have some
constraints, like trigger functions to prevent overlapping date-
periods.

I have written pl/pgsql code to do the importing bit, but there is one
big drawback: as soon as anything in a function (or in a FOR ... SELECT
... LOOP) goes wrong, everything is rolled back. Normally this
behaviour would be fine, but not in this situation, because my old data
contains rows sometimes conflicts with the constraints.

What I want is all conflicting rows to be not imported, just plainly be
refused, and let everything that's OK be imported. What is the best way
to do this?

I think the only way is to do the conflict testing in your code and not try the
insert if it's going to fail.

It'll slow the process down but if you think in terms of issuing a begin at the
start of a application and you are hoping for a commit at the end then
obviously you have to protect against situations that you know will give rise
to an error. For example, if you have some data from a form that a user
completes when you come to save that data you may not know just from that data
that it a foreign key constraint will pass so you have to check that before
trying an insert otherwise or transaction will abort.

--
Nigel J. Andrews

#3Jules Alberts
jules.alberts@arbodienst-limburg.nl
In reply to: Nigel J. Andrews (#2)
Re: error tolerant inserting of data

Op 5 Mar 2003 (14:36), schreef Nigel J. Andrews <nandrews@investsystems.co.uk>:

On Wed, 5 Mar 2003, Jules Alberts wrote:

Hello everybody,
I'm working on a new pg database that will contain all data that's
currently in our old system. The data is already in the db, in tables
with names like import_table1, import_table2 etc., now I have to
transfer them to the real tables. The real tables have some
constraints, like trigger functions to prevent overlapping date-
periods.

I have written pl/pgsql code to do the importing bit, but there is one
big drawback: as soon as anything in a function (or in a FOR ... SELECT
... LOOP) goes wrong, everything is rolled back. Normally this
behaviour would be fine, but not in this situation, because my old data
contains rows sometimes conflicts with the constraints.

What I want is all conflicting rows to be not imported, just plainly be
refused, and let everything that's OK be imported. What is the best way
to do this?

I think the only way is to do the conflict testing in your code and not try the
insert if it's going to fail.

Yes, I did tried and it works but this is _very_ slow.

It'll slow the process down but if you think in terms of issuing a begin at the
start of a application and you are hoping for a commit at the end then
obviously you have to protect against situations that you know will give rise
to an error. For example, if you have some data from a form that a user
completes when you come to save that data you may not know just from that data
that it a foreign key constraint will pass so you have to check that before
trying an insert otherwise or transaction will abort.

I know, and that OK when the system goes live. But right now I'm
importing data and all I want is the good data to be imported as
swiftly as possible while tha bad data is dropped. The data is
somewhere between 500Mb and 1Gb and the transition from the old system
to the new will be during a weekend (big bang).

When I do errorchecking before actually importing the data one weekend
will be _way_ too short :-(