How are foreign key constraints built?

Started by Wesabout 21 years ago9 messagesgeneral
Jump to latest
#1Wes
wespvp@syntegra.com

How are foreign key constraints built? In loading my database into
PostgreSQL 8.0, on the command:

ALTER TABLE ONLY TABLEA
ADD CONSTRAINT "$1" FOREIGN KEY (mkey) REFERENCES tableb(mkey) ON DELETE
CASCADE;

I ended up with the following in pg_tmp as it is adding the constraint:

-rw------- 1 postgres admin 1073741824 23 Jan 06:09 pgsql_tmp2559.25
-rw------- 1 postgres admin 1073741824 23 Jan 06:24 pgsql_tmp2559.26
-rw------- 1 postgres admin 1073741824 23 Jan 06:39 pgsql_tmp2559.27
-rw------- 1 postgres admin 636526592 23 Jan 06:48 pgsql_tmp2559.28
-rw------- 1 postgres admin 1073741824 23 Jan 11:51 pgsql_tmp2559.29
-rw------- 1 postgres admin 1073741824 23 Jan 11:34 pgsql_tmp2559.30
-rw------- 1 postgres admin 1073741824 23 Jan 11:36 pgsql_tmp2559.31
-rw------- 1 postgres admin 1073741824 23 Jan 11:37 pgsql_tmp2559.32
-rw------- 1 postgres admin 1073741824 23 Jan 11:38 pgsql_tmp2559.33
-rw------- 1 postgres admin 1073741824 23 Jan 11:39 pgsql_tmp2559.34
-rw------- 1 postgres admin 1073741824 23 Jan 11:41 pgsql_tmp2559.35
-rw------- 1 postgres admin 1073741824 23 Jan 11:42 pgsql_tmp2559.36
-rw------- 1 postgres admin 1073741824 23 Jan 11:43 pgsql_tmp2559.37
-rw------- 1 postgres admin 1073741824 23 Jan 11:45 pgsql_tmp2559.38
-rw------- 1 postgres admin 1073741824 23 Jan 11:46 pgsql_tmp2559.39
-rw------- 1 postgres admin 1073741824 23 Jan 11:47 pgsql_tmp2559.40
-rw------- 1 postgres admin 1073741824 23 Jan 11:49 pgsql_tmp2559.41
-rw------- 1 postgres admin 1073741824 23 Jan 11:50 pgsql_tmp2559.42
-rw------- 1 postgres admin 603136000 23 Jan 11:51 pgsql_tmp2559.43

I believe files 25-28 are about the size of one of the indexes, but 29-43 is
about 35% larger than the tmp files when building the other index (the other
index is about 10GB).

There's no problem here, I'd just like to understand what it is doing. I
expected adding the foreign key constraint would just use the existing
indexes to verify the database is currently consistent. Is this just
working space to more efficiently build the initial constraint, or does it
actually write this to the database?

Wes

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#1)
Re: How are foreign key constraints built?

Wes <wespvp@syntegra.com> writes:

There's no problem here, I'd just like to understand what it is doing.

Either a hash or merge join between the two tables, to verify that all
the keys in the referencing table exist in the referenced table. The
intermediate data is evidently spilling to disk.

I expected adding the foreign key constraint would just use the existing
indexes to verify the database is currently consistent.

People have this weird notion that an index-based plan is always faster
than anything else. If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

regards, tom lane

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#2)
Re: How are foreign key constraints built?

On Sun, Jan 23, 2005 at 02:01:41PM -0500, Tom Lane wrote:

Wes <wespvp@syntegra.com> writes:

There's no problem here, I'd just like to understand what it is doing.

Either a hash or merge join between the two tables, to verify that all
the keys in the referencing table exist in the referenced table. The
intermediate data is evidently spilling to disk.

If this were done with a nested loop, you wouldn't need any intermediate
data, right? You can just throw the result-set away. ISTM that in the
case of just verifying existance of data, it might be faster to do a
nested loop that doesn't have to spill anywhere, instead of a hash or
merge that needs to generate a bunch of intermediate data. Is the
optimizer able to take this into account?

I expected adding the foreign key constraint would just use the existing
indexes to verify the database is currently consistent.

People have this weird notion that an index-based plan is always faster
than anything else. If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

Well, every other database I've used can do index covering, which means
index scans *are* faster.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#4Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Jim Nasby (#3)
Re: How are foreign key constraints built?

On Sun, Jan 23, 2005 at 03:19:10PM -0600, Jim C. Nasby wrote:

People have this weird notion that an index-based plan is always faster
than anything else. If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

Well, every other database I've used can do index covering, which means
index scans *are* faster.

... on those database systems. Indexes are different in Postgres in
general: they don't have visibility info (other systems don't need it,
tuples are always visible), and in some databases you have clustered
indexes, where the index is also the heap.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"We are who we choose to be", sang the goldfinch
when the sun is high (Sandman)

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#4)
Re: How are foreign key constraints built?

On Sun, Jan 23, 2005 at 06:45:36PM -0300, Alvaro Herrera wrote:

On Sun, Jan 23, 2005 at 03:19:10PM -0600, Jim C. Nasby wrote:

People have this weird notion that an index-based plan is always faster
than anything else. If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

Well, every other database I've used can do index covering, which means
index scans *are* faster.

... on those database systems. Indexes are different in Postgres in
general: they don't have visibility info (other systems don't need it,
tuples are always visible), and in some databases you have clustered
indexes, where the index is also the heap.

Yes, I understand. I was just pointing out that in other databases, an
index scan of even the entire table can be faster, hence the mentality
that index scans are always better.

I really hope that the current discussion on hackers about tuple
visibility in indexes leads somewhere; I think that would be a huge gain
for PostgreSQL.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#6Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#3)
Re: How are foreign key constraints built?

"Jim C. Nasby" <decibel@decibel.org> writes:

Well, every other database I've used can do index covering, which means
index scans *are* faster.

Still not necessarily true. In a case like this it would still be random
access which would be slower than sequential access.

Though Oracle is capable of taking the best of both worlds and doing a hash
join but taking the data from sequentially reading the index instead of the
table.

--
greg

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#6)
Re: How are foreign key constraints built?

On Mon, Jan 24, 2005 at 09:24:49AM -0500, Greg Stark wrote:

"Jim C. Nasby" <decibel@decibel.org> writes:

Well, every other database I've used can do index covering, which means
index scans *are* faster.

Still not necessarily true. In a case like this it would still be random
access which would be slower than sequential access.

Actually, even with random access, a covering index can still be faster.
Imagine a single-field index on a table with 40 fields.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#8Wes
wespvp@syntegra.com
In reply to: Tom Lane (#2)
Re: How are foreign key constraints built?

On 1/23/05 1:01 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

Ouch! That was a big mistake. Since inquiring minds want to know, I
decided to give that a try. The expected outcome is to beat the heck out of
the index disks as it read one index and referenced the other to see if the
value existed. What appears to have happened is that it went through the
same process as before, but read each data record via the index. It still
created all the pgsql_tmp files, the data disk was still the heaviest hit
(expected no or little access there), and it beat the living daylights out
of my swap - pageins/outs like crazy. The I/O on the index disks was
negligible compared to the data and swap disks. I won't try that again...

Wes

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Wes (#8)
Re: How are foreign key constraints built?

On Tue, Jan 25, 2005 at 09:38:20AM -0600, Wes wrote:

On 1/23/05 1:01 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

If you like you can try the operation with "set
enable_seqscan = off", but I bet it will take longer.

Ouch! That was a big mistake. Since inquiring minds want to know, I
decided to give that a try. The expected outcome is to beat the heck out of
the index disks as it read one index and referenced the other to see if the
value existed. What appears to have happened is that it went through the
same process as before, but read each data record via the index. It still
created all the pgsql_tmp files, the data disk was still the heaviest hit
(expected no or little access there), and it beat the living daylights out
of my swap - pageins/outs like crazy. The I/O on the index disks was
negligible compared to the data and swap disks. I won't try that again...

If the OS is swapping you've got serious issues; you need to look at
your configurating settings that deal with memory and figure out why
you're running out.

And yes, PostgreSQL can't do 'index covering', so even when it accesses
a table via an index it still has to read the base table. This is why if
you need to read the entire table it's faster to seqscan than index
scan.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"