Adding foreign key constraints without integrity check?
Is there a way to add a foreign key constraint without having to wait for it
to check the consistency of all existing records? If a database is being
reloaded (pg_dumpall then load), it really shouldn't be necessary to check
the referential integrity - or at least I should be able to stipulate that I
am accepting that risk.
My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows). That's a very long time to be out of
commission. I'd really like to eliminate that second 40 hours so I can get
it down to a normal weekend.
Wes
Wes <wespvp@syntegra.com> writes:
My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows).
What PG version is this, and what have you got maintenance_work_mem set to?
Undersized m_w_m would hurt both index build and FK checking ...
regards, tom lane
On 6/18/06 10:48 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows).What PG version is this, and what have you got maintenance_work_mem set to?
Undersized m_w_m would hurt both index build and FK checking ...
Sorry, forgot that info..
PG is 8.1.4 (I'm testing an upgrade from 7.3.4 to 8.1.4).
maintenance_work_mem is set to 983025 - 1.5 times the previous value of
655350. Current RSS of postmaster is about 1.3 GB.
System memory is 2GB (would like more, but...). Data is on one array
(hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another
(hardware RAID 5 of five 15K 146 GB SCSI drives on a separate channel), and
pg_xlog on a third RAID 1 on a third channel). There are two 2.4 GHz Xeon
processors).
Wes
Wes wrote:
Is there a way to add a foreign key constraint without having to wait for it
to check the consistency of all existing records? If a database is being
reloaded (pg_dumpall then load), it really shouldn't be necessary to check
the referential integrity - or at least I should be able to stipulate that I
am accepting that risk.
You could create the fk-constraints _first_, then disable them, load
the data, reindex, and reenable them afterwards.
pg_dump/pg_restore can enable and disable fk-constraints before restoring
the data, I believe. It does so by tweaking the system catalogs.
The only problem I can see is that creating the fk-constraints might create
some indices too. But maybe you can manually drop those indices afterwards - I
don't know if the fk really _depends_ on the index, or if it creates it only
for convenience.
greetings, Florian Pflug
You could create the fk-constraints _first_, then disable them, load
the data, reindex, and reenable them afterwards.pg_dump/pg_restore can enable and disable fk-constraints before restoring
the data, I believe. It does so by tweaking the system catalogs.
Are referring to '--disable-triggers' on pg_dump? Will this work for
foreign key constraints? The doc talks about triggers, but doesn't say
anything about FK constraints (are these implemented as triggers?) I don't
use pg_restore, just psql.
The only problem I can see is that creating the fk-constraints might create
some indices too. But maybe you can manually drop those indices afterwards - I
don't know if the fk really _depends_ on the index, or if it creates it only
for convenience.
I don't see any indexes being added to the table beyond what I add, and
those added as a primary key constraint. Currently, pg_dump outputs the FK
constraints after the indexes are built, as the last steps. If I try to add
the FK constraints after loading the database definitions, but without any
indexes, I'm not sure what would happen.
Wes
What PG version is this, and what have you got maintenance_work_mem set to?
Undersized m_w_m would hurt both index build and FK checking ...
Looking at the stats again, I did see something unexpected. With MWM set to
983025, index builds were running about 1.3 GB for both RSS and virtual.
Adding foreign key constraints, RSS is about 1.1 GB but virtual is slightly
over 2 GB.
Wes
Wes wrote:
You could create the fk-constraints _first_, then disable them, load
the data, reindex, and reenable them afterwards.pg_dump/pg_restore can enable and disable fk-constraints before restoring
the data, I believe. It does so by tweaking the system catalogs.Are referring to '--disable-triggers' on pg_dump? Will this work for
foreign key constraints? The doc talks about triggers, but doesn't say
anything about FK constraints (are these implemented as triggers?) I don't
use pg_restore, just psql.
Yes, I was referring to "--disable-triggers". I always assumes that it
disables FK-Constraints as well as triggers, but now that you ask I
realize that I might have never actually tried that ;-)
But FK-Constraints _are_ implemented as triggers internally, so I guess
it should work.
The only problem I can see is that creating the fk-constraints might create
some indices too. But maybe you can manually drop those indices afterwards - I
don't know if the fk really _depends_ on the index, or if it creates it only
for convenience.I don't see any indexes being added to the table beyond what I add, and
those added as a primary key constraint. Currently, pg_dump outputs the FK
constraints after the indexes are built, as the last steps. If I try to add
the FK constraints after loading the database definitions, but without any
indexes, I'm not sure what would happen.
Hm.. it i tried it out, and came to the conclusion that my approach
doesn't work :-(
You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-(
So unless you find a way to force postgres to ignore the index when
inserting data, my suggestion won't work :-(
greetings, Florian Pflug
Florian,
Are you certain:
"You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-("
????
I'm not sure I am convinced the necessity of a foreign key, "needing" to
reference a primary keyed entry from a different table.
Florian G. Pflug wrote:
Show quoted text
Wes wrote:
You could create the fk-constraints _first_, then disable them, load
the data, reindex, and reenable them afterwards.pg_dump/pg_restore can enable and disable fk-constraints before
restoring
the data, I believe. It does so by tweaking the system catalogs.Are referring to '--disable-triggers' on pg_dump? Will this work for
foreign key constraints? The doc talks about triggers, but doesn't say
anything about FK constraints (are these implemented as triggers?) I
don't
use pg_restore, just psql.Yes, I was referring to "--disable-triggers". I always assumes that it
disables FK-Constraints as well as triggers, but now that you ask I
realize that I might have never actually tried that ;-)But FK-Constraints _are_ implemented as triggers internally, so I
guess it should work.The only problem I can see is that creating the fk-constraints might
create
some indices too. But maybe you can manually drop those indices
afterwards - I
don't know if the fk really _depends_ on the index, or if it creates
it only
for convenience.I don't see any indexes being added to the table beyond what I add, and
those added as a primary key constraint. Currently, pg_dump outputs
the FK
constraints after the indexes are built, as the last steps. If I try
to add
the FK constraints after loading the database definitions, but
without any
indexes, I'm not sure what would happen.Hm.. it i tried it out, and came to the conclusion that my approach
doesn't work :-(You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-(So unless you find a way to force postgres to ignore the index when
inserting data, my suggestion won't work :-(greetings, Florian Pflug
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Sun, 2006-06-18 at 22:41, Wes wrote:
Is there a way to add a foreign key constraint without having to wait for it
to check the consistency of all existing records? If a database is being
reloaded (pg_dumpall then load), it really shouldn't be necessary to check
the referential integrity - or at least I should be able to stipulate that I
am accepting that risk.My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows). That's a very long time to be out of
commission. I'd really like to eliminate that second 40 hours so I can get
it down to a normal weekend.
Are you sure that's really the problem? Do you have indexes on the
referring tables (i.e. the foreign key that points to the other table's
primary key). Not having an index on the subordinate table makes each
and every check on the FK->PK relationship require a seq scan of the
subordinate table.
On Jun 18, 2006, at 11:08 PM, Wes wrote:
System memory is 2GB (would like more, but...). Data is on one array
(hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another
(hardware RAID 5 of five 15K 146 GB SCSI drives on a separate
channel), and
pg_xlog on a third RAID 1 on a third channel). There are two 2.4
GHz Xeon
processors).
FWIW, RAID5 isn't normally a good recipe for good database performance.
This won't help with constraint checking, but you should consider
turning fsync off during your restore.
Also, why are you frequently dumping and restoring?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 6/19/06 3:24 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote:
Are you sure that's really the problem? Do you have indexes on the
referring tables (i.e. the foreign key that points to the other table's
primary key). Not having an index on the subordinate table makes each
and every check on the FK->PK relationship require a seq scan of the
subordinate table.
Yes, I am sure I have the indexes for both sides. (I just double checked).
If not, wouldn't I have abysmal load performance in production, or does that
function differently?
The constraint that just finished after 18+ hours has a non-composite index
on both sides. The other constraint:
ADD CONSTRAINT "$2" FOREIGN KEY (recipient) REFERENCES
addresses(address_key);
has a normal index on address_key. The recipient index is (recipient, date)
composite index. This index has always been used when searching on just
recipient, and the last time I rebuilt the database (a year ago), the FK
addition was inline with expectations.
In every case, it takes about the same time to add the foreign key
constraint as to create the index, maybe a little more. This is true
regardless of whether one of the indexes is composite or not. One
constraint build just finished after a little more than 18 hours. The table
where the constraint is being created has about 900 million rows. The
'references x' table has about 200 million rows. It churns along eventually
eating up about 50 GB or so in the 'base' directory. When that stops
growing, it took maybe another hour or two to complete.
I'd just like to be able to tell it to simply add the constraint without
doing any checking. I know the data is consistent. I just dumped it from a
database that has referential integrity enabled. Even if there were an
error that crept in to the old database, I don't care - just add the
constraint so I can get back online. Right now I'm looking at around 80-90
hours total to do a reload. What's it going to be a year or two from now?
I could be out of commission for a week.
Wes
On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote:
FWIW, RAID5 isn't normally a good recipe for good database performance.
Understood, but RAID 1 isn't always feasible. The database performs very
well with the current hardware configuration - I have no complaints. The
only time we have issues is during index rebuilds or a reload.
This won't help with constraint checking, but you should consider
turning fsync off during your restore.
I didn't think about doing that, but I wonder how much it would help? The
actual data load only takes about 6 hours or so. It's all the index
building and constraint checking that takes the time. This time seems to be
spent mostly writing temp files and sorting.
Also, why are you frequently dumping and restoring?
We have to periodically reindex due to the fact that vacuum processes
indexes in index order instead of disk space order. A 'vacuum full' is
currently taking about 24 hours. After a reindex, I expect that to drop to
3-4 hours until the index becomes fragmented. I don't think much can be
done about the time to reindex (or the order vacuum processes indexes).
I wouldn't say we dump/reload frequently. Right now I'm doing a trial run
for the 8.x upgrade - I have to make sure I can complete it in the allotted
time. However, about a year ago, I had to do multiple reloads in a 2 week
period. We had database corruption, and had to reload from source to get a
good database - multiple times. We had a defective disk array that wasn't
reporting any errors but kept corrupting the database.
Besides PG upgrades, I have to consider how long it takes to recover from
system failures if I can't use the offline file system backup. I'll be
switching to online backups once we get upgraded, but if a reload fails
there, I'll again have to fall back to the weekly source backup.
Wes
On Mon, 2006-06-19 at 16:39, Wes wrote:
On 6/19/06 3:47 PM, "Jim Nasby" <jnasby@pervasive.com> wrote:
FWIW, RAID5 isn't normally a good recipe for good database performance.
Understood, but RAID 1 isn't always feasible. The database performs very
well with the current hardware configuration - I have no complaints. The
only time we have issues is during index rebuilds or a reload.
I think Jim was thinking of RAID 1+0. If you gotta use RAID 5, at least
have a contoller with several hundred megs of battery backed cache.
Better yet, use a RAID controller with a gig or so of BBU cache and run
RAID 1+0 on it. If you can.
Also, why are you frequently dumping and restoring?
We have to periodically reindex due to the fact that vacuum processes
indexes in index order instead of disk space order. A 'vacuum full' is
currently taking about 24 hours. After a reindex, I expect that to drop to
3-4 hours until the index becomes fragmented. I don't think much can be
done about the time to reindex (or the order vacuum processes indexes).
Actually, the same question about why your frequently dumping and
restoring applies to full vacuums. Why are you doing them? A properly
running database should not need vacuum fulls. Nor reindexing.
Standard vacuums, scheduled to run often enough, combined with a large
enough Free Space Map should mean never needing a full vacuum or
reindex. If vacuums slow your system down too much, then you don't have
enough I/O bandwidth, and need to see the previous point about a better
RAID setup.
You should look into slony. You can replicate from one version of pgsql
to another, a feature intended to be used for updating.
That way, you can put an 8.1 server behind your 7.4 or 8.0 server,
replicate to it, wait for it to catch up, shut down your app, fail over
your server to the 8.1 machine and viola, you're on 8.1 with minimum
downtime.
louis gonzales wrote:
Florian,
Are you certain:"You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates an
index, which you can't drop without dropping the PK :-("
Arg.. Should have written "unique index" instead of primary key..
But it doesn't change much, since a unique index and a pk are nearly
the same.
I'm not sure I am convinced the necessity of a foreign key, "needing" to
reference a primary keyed entry from a different table.
I tried the following:
create table a(id int4) ;
create table b(id int4, a_id int4) ;
alter table b add constraint pk foreign key (a_id) references a (id) ;
The alter table gave me an error stating that I need to have a unique index
defined on a.id...
greetings, Florian Pflug
On 6/19/06 4:48 PM, "Scott Marlowe" <smarlowe@g2switchworks.com> wrote:
I think Jim was thinking of RAID 1+0. If you gotta use RAID 5, at least
have a contoller with several hundred megs of battery backed cache.
Better yet, use a RAID controller with a gig or so of BBU cache and run
RAID 1+0 on it. If you can.
Yes, I realize that is ideal (I meant to write RAID 10 not 1). That takes
hardware I don't have, and can't justify spending the money on right now.
I/O doesn't appear to be a major issue, although I wasn't constantly
monitoring it. Most of the time, the I/O load was not that high - CPU was
pegged. Normal production updates and queries are well within (not at all
low) expectations.
Actually, the same question about why your frequently dumping and
restoring applies to full vacuums. Why are you doing them? A properly
running database should not need vacuum fulls. Nor reindexing.
Simple... VACUUM FULL reads the entire database. Between a full source
dump and a full vacuum, I've got a better chance of finding and recovering
from corruption sooner. Once bit... There is no (last time I checked)
utility to verify full database integrity, and there are no CRC's on data
blocks. A pg_dumpall doesn't touch the indexes. While this won't catch
everything, seems pretty good at catching corrupted indexes. I've seen no
more effective ways of verifying the entire database. The kind of hardware
problem we encountered last year is rare, but it does happen. I've seen
similar things many times over the years. RAID doesn't help you when a
controller mirrors garbage.
You should look into slony. You can replicate from one version of pgsql
to another, a feature intended to be used for updating.
I'd love to, but that requires hardware I don't have.
Wes
BTW, I do appreciate any and all suggestions. I hope my last message didn't
come across otherwise. That's not what was intended. It's just that
adding/replacing hardware is not an option right now. Maybe next year...
I'm still trying to dig up another 2GB memory.
The database actually performs extremely well with the current hardware in
the vast majority of cases. I just wish I could cut off the wasted 40+
hours adding the foreign key constraints - it would cut the reload time in
half.
It's too bad the suggestion of creating the FK before hand didn't pan out.
That looked like a good shot.
Wes
Florian,
I understand where you're coming from. Indexes are always unique and
all RDBMS systems use them to 'uniquely' identify a row from the the
perspective of internal software management. Index != PrimaryKey, so
every table created, despite any Primary/Foreign key contraints put on
them, always have a 1-1 Index per row entry. At least that's the way I
understand it, can someone else affirm this statement or redirect a
misguided 'me ;)'?
Thanks group,
Florian G. Pflug wrote:
louis gonzales wrote:
Florian,
Are you certain:"You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates
an index, which you can't drop without dropping the PK :-("Arg.. Should have written "unique index" instead of primary key..
But it doesn't change much, since a unique index and a pk are nearly
the same.I'm not sure I am convinced the necessity of a foreign key, "needing"
to reference a primary keyed entry from a different table.I tried the following:
create table a(id int4) ;
create table b(id int4, a_id int4) ;
alter table b add constraint pk foreign key (a_id) references a (id) ;
Not sure, but maybe the syntax on this is slightly ambiguous. Try
creating table b with a primary key constraint on a_id, then alter the
table to add foreign key constraint. I'm going to look up a couple of
references and see what I can dig up. That may be perfectly legitimate
syntax, but it just seems off to me.
Sorry if it is, I've spent the last few days on Oracle 9i, so I'm
jumping around in my memory.... quite a bit for validity amongst
different syntax.
Show quoted text
The alter table gave me an error stating that I need to have a unique
index
defined on a.id...
greetings, Florian Pflug
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Florian,
So if you:
create table test (
id varchar(2) primary key,
age int );
create table test2 (
id varchar(2) primary key,
age2 int );
alter table test2 add foreign key (id) references test (id);
\d test2
you'll see that attribute "id" from test2, now has both a primary key
constraint and a foreign key that references the primary key of test.
perhaps you can assert two constraints at the same time during an "alter
table ..." not sure why your example syntax is failing
louis gonzales wrote:
Show quoted text
Florian,
I understand where you're coming from. Indexes are always unique and
all RDBMS systems use them to 'uniquely' identify a row from the the
perspective of internal software management. Index != PrimaryKey, so
every table created, despite any Primary/Foreign key contraints put on
them, always have a 1-1 Index per row entry. At least that's the way
I understand it, can someone else affirm this statement or redirect a
misguided 'me ;)'?Thanks group,
Florian G. Pflug wrote:
louis gonzales wrote:
Florian,
Are you certain:"You can only create an FK if the fields you are referencing in the
foreign table form a PK there. And creating a PK implicitly creates
an index, which you can't drop without dropping the PK :-("Arg.. Should have written "unique index" instead of primary key..
But it doesn't change much, since a unique index and a pk are nearly
the same.I'm not sure I am convinced the necessity of a foreign key,
"needing" to reference a primary keyed entry from a different table.I tried the following:
create table a(id int4) ;
create table b(id int4, a_id int4) ;
alter table b add constraint pk foreign key (a_id) references a (id) ;Not sure, but maybe the syntax on this is slightly ambiguous. Try
creating table b with a primary key constraint on a_id, then alter the
table to add foreign key constraint. I'm going to look up a couple of
references and see what I can dig up. That may be perfectly
legitimate syntax, but it just seems off to me.Sorry if it is, I've spent the last few days on Oracle 9i, so I'm
jumping around in my memory.... quite a bit for validity amongst
different syntax.The alter table gave me an error stating that I need to have a unique
index
defined on a.id...greetings, Florian Pflug
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
louis gonzales wrote:
Florian,
I understand where you're coming from. Indexes are always unique and
all RDBMS systems use them to 'uniquely' identify a row from the the
perspective of internal software management.
Surely there are non-unique indices - meaning indices for which there
are more then one entry for a given key.
Index != PrimaryKey, so
every table created, despite any Primary/Foreign key contraints put on
them, always have a 1-1 Index per row entry. At least that's the way I
understand it, can someone else affirm this statement or redirect a
misguided 'me ;)'?
In postgresql at least, I believe that if you create no index (or pk), then
there is no index. The only exception are toast-tables, but you don't even
see those tables normally, and they're just an implementation detail of how
large attributes are stored.
However, the whole point of this thread was whether there is a way to create
a FK without postgres checking if it's actually satisfied, or not. This could
speed up restoring a dump, because you know that the FK is actually satisfied in
that case.
My suggestion was to create the fk _before_ loading the data, and disable it
similarly to what "--disable-triggers" doest. It turned out, however, that a
FK always depends on a unique index (be it a primary key, or not), which prevents
my plan from working :-(
greetings, Florian Pflug
On 6/20/06 5:07 AM, "Florian G. Pflug" <fgp@phlo.org> wrote:
My suggestion was to create the fk _before_ loading the data, and disable it
similarly to what "--disable-triggers" doest. It turned out, however, that a
FK always depends on a unique index (be it a primary key, or not), which
prevents
my plan from working :-(
That was a great idea - too bad it didn't pan out.
I don't suppose there's any (reasonable) way to directly insert into the
system tables to create the constraint? I could knock almost 2 days off of
the almost 4 days to reload if I could solve this.
Wes