Restore referencial integrity

Started by Carlos Henrique Reimerover 15 years ago7 messagesgeneral
Jump to latest
#1Carlos Henrique Reimer
carlos.reimer@opendb.com.br

Hi,

We had by mistake dropped the referencial integrety between two huge tables
and now I'm facing the following messages when trying to recreate the
foreign key again:

alter table posicoes_controles add
CONSTRAINT protocolo FOREIGN KEY (protocolo)
REFERENCES posicoes (protocolo) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
DETAIL: Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has not
corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

Column "protocolo" is "posicoes" table primary key but is not in any index
colum of table "posicoes_controles".

Thank you very much for any help!

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

#2George H
george.dma@gmail.com
In reply to: Carlos Henrique Reimer (#1)
Re: Restore referencial integrity

On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:

Hi,

We had by mistake dropped the referencial integrety between two huge tables
and now I'm facing the following messages when trying to recreate the
foreign key again:

alter table posicoes_controles add
  CONSTRAINT protocolo FOREIGN KEY (protocolo)
      REFERENCES posicoes (protocolo) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR:  insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has not
corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

Column "protocolo" is "posicoes" table primary key but is not in any index
colum of table "posicoes_controles".

Thank you very much for any help!
--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com

#3Carlos Henrique Reimer
carlos.reimer@opendb.com.br
In reply to: George H (#2)
Re: Restore referencial integrity

Hi,

Yes, this is a good suggestion but as the table posicoes_controles has
3.71172e+008 rows it will perform 3.71172e+008 selects against table
posicoes to check if the protocolo is in table.

I was think something like:

explain delete from posicoes_controles where protocolo not in (select
protocolo from posicoes);

"Seq Scan on posicoes_controles (cost=9929689.38..1180088620108403.70
rows=189165121 width=6)"
" Filter: (NOT (subplan))"
" SubPlan"
" -> Materialize (cost=9929689.38..15217480.18 rows=380245580 width=4)"
" -> Seq Scan on posicoes (cost=0.00..8064108.80 rows=380245580
width=4)"

Will this work better that a pl/pgsql as you suggested? Or is there
something even betther?

Thank you!
2010/8/30 George H <george.dma@gmail.com>

On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:

Hi,

We had by mistake dropped the referencial integrety between two huge

tables

and now I'm facing the following messages when trying to recreate the
foreign key again:

alter table posicoes_controles add
CONSTRAINT protocolo FOREIGN KEY (protocolo)
REFERENCES posicoes (protocolo) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR: insert or update on table "posicoes_controles" violates foreign

key

constraint "protocolo"
DETAIL: Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign

key

constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has

not

corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

Column "protocolo" is "posicoes" table primary key but is not in any

index

colum of table "posicoes_controles".

Thank you very much for any help!
--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

#4David Fetter
david@fetter.org
In reply to: Carlos Henrique Reimer (#1)
Re: Restore referencial integrity

On Sun, Aug 29, 2010 at 11:30:57PM -0300, Carlos Henrique Reimer wrote:

Hi,

We had by mistake dropped the referencial integrety between two huge
tables

Agora o elefante vai pegar! ;)

and now I'm facing the following messages when trying to recreate
the foreign key again:

alter table posicoes_controles add
CONSTRAINT protocolo FOREIGN KEY (protocolo)
REFERENCES posicoes (protocolo) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
DETAIL: Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has not
corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

First, if pescioes_controles doesn't already have an index on
protocolo, create such an index. You can do something like

CREATE INDEX CONCURRENTLY ON pescioes_controles(protocolo);

After you have finished the indexing, you'll need to schedule some
down time, cut off all other access to the server, and then run
something like the following:

BEGIN;
DELETE FROM pescioes_controles WHERE NOT EXISTS (
SELECT 1 FROM pesicoes WHERE pesicoes.protocolo = pescioes_controles.protocolo
);
ALTER TABLE posicoes_controles add
CONSTRAINT protocolo FOREIGN KEY (protocolo)
REFERENCES posicoes (protocolo) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE;
COMMIT;

Hope this helps :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Filip Rembiałkowski
filip.rembialkowski@gmail.com
In reply to: George H (#2)
Re: Restore referencial integrity

I remember when I handled such situations without downtime, in 24/7 HA
setup, to avoid large transactions - You could try SELECT FROM A LEFT
JOIN B WHERE B.ID IS NULL LIMIT 10 -- and use this as a base for
DELETE statement...

2010/8/30, George H <george.dma@gmail.com>:

On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:

Hi,

We had by mistake dropped the referencial integrety between two huge
tables
and now I'm facing the following messages when trying to recreate the
foreign key again:

alter table posicoes_controles add
  CONSTRAINT protocolo FOREIGN KEY (protocolo)
      REFERENCES posicoes (protocolo) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR:  insert or update on table "posicoes_controles" violates foreign
key
constraint "protocolo"
DETAIL:  Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign key
constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has not
corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

Column "protocolo" is "posicoes" table primary key but is not in any index
colum of table "posicoes_controles".

Thank you very much for any help!
--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Wysłane z mojego urządzenia przenośnego

Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

#6David Fetter
david@fetter.org
In reply to: Carlos Henrique Reimer (#1)
Re: Restore referencial integrity

On Mon, Aug 30, 2010 at 05:04:36PM -0300, Carlos Henrique Reimer wrote:

Hi

Thank David and Georg for your suggestions.

Yes, there is an index now defined on column protocolo in table
posicoes_controles.

Legal!

I've selected two suggested commands to compare which would be more
performatic and which will run faster:

Option 1)
explain delete from posicoes_controles where protocolo not in (select
protocolo from posicoes);
"Seq Scan on posicoes_controles (cost=9954587.42..1185225908771206.50
rows=189513428 width=6)"
" Filter: (NOT (subplan))"
" SubPlan"
" -> Materialize (cost=9954587.42..15255636.80 rows=381199038 width=4)"
" -> Seq Scan on posicoes (cost=0.00..8084329.38 rows=381199038
width=4)"

Option 2)
explain delete FROM posicoes_controles WHERE NOT EXISTS (
SELECT 1 FROM posicoes WHERE posicoes.protocolo =
posicoes_controles.protocolo
);
"Seq Scan on posicoes_controles (cost=0.00..9560672015.05 rows=189419047
width=6)"
" Filter: (NOT (subplan))"
" SubPlan"
" -> Index Scan using pk_posicoes_protocolo on posicoes
(cost=0.00..25.19 rows=1 width=0)"
" Index Cond: (protocolo = $0)"
I'm not an explain specialist but I understood the second option will run
much more faster.

It probably will. EXISTS returns immediately when it finds the first
row.

Let me know if I understood the explain for the second option:
1) Run a seq scan on posicoes_controles and get the protocolo key to access
posicoes_protocolo
2) For each row accessed in item 1 run an index scan on posicoes to check if
the key
is in the table posicoes
3) If the parent found is not found on posicoes then remove the row from
posicoes_controles

Am I thinking correctly?

I believe so.

Cheers,
David (whose pt_BR is pretty w34k)
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#7Carlos Henrique Reimer
carlos.reimer@opendb.com.br
In reply to: Filip Rembiałkowski (#5)
Re: Restore referencial integrity

Hi Filip,

This was an excellent suggestion. I've run this join and just 2 minutes
later got 1000 records to delete.

Will start the deletes for them and then repeat the processing until all are
gone. With this approach I did not need to wait for the maintenance window
to fix the foreign key.

I think this finishes my issue.

Thank you all!

2010/8/30 Filip Rembiałkowski <filip.rembialkowski@gmail.com>

I remember when I handled such situations without downtime, in 24/7 HA
setup, to avoid large transactions - You could try SELECT FROM A LEFT
JOIN B WHERE B.ID <http://b.id/&gt; IS NULL LIMIT 10 -- and use this as a
base for
DELETE statement...

2010/8/30, George H <george.dma@gmail.com>:

On Mon, Aug 30, 2010 at 5:30 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:

Hi,

We had by mistake dropped the referencial integrety between two huge
tables
and now I'm facing the following messages when trying to recreate the
foreign key again:

alter table posicoes_controles add
CONSTRAINT protocolo FOREIGN KEY (protocolo)
REFERENCES posicoes (protocolo) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE;

ERROR: insert or update on table "posicoes_controles" violates foreign
key
constraint "protocolo"
DETAIL: Key (protocolo)=(338525035) is not present in table "posicoes".
********** Erro **********
ERROR: insert or update on table "posicoes_controles" violates foreign

key

constraint "protocolo"
SQL state: 23503
Detalhe: Key (protocolo)=(338525035) is not present in table "posicoes".
As the error message tells, the table "posicoes_controles" has values in
column "protocolo" that are not present in column "protocolo" of table
"posicoes". This happened because some programs removed rows from table
"posicoes" while the referencial integrity was dropped.

Now I need to remove all rows from table "posicoes_controles" that has

not

corresponding row in table "posicoes".

As these are huge tables, almost 100GB each, and the server
hardware restricted (4GB RAM) I would like a suggestion of which command
or commands should be used from the performance perspective.

Column "protocolo" is "posicoes" table primary key but is not in any

index

colum of table "posicoes_controles".

Thank you very much for any help!
--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Hi,

I guess you could consider the following strategy: Halt the server or
lock the table or something so no program is allowed to delete any
rows on the affected tables. Run a PL/SQL script that will remove rows
from "posicoes_controles" whose foreign key is not present in table
"posics." Then re-issue the foreign key constraint. Then unlock the
table or whatever it is you have to do get programs to be able to use
the tables again.

I hope this helps somewhat.
--
George H
george.dma@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Wysłane z mojego urządzenia przenośnego

Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br