7.4 - FK constraint performance

Started by owabout 22 years ago56 messageshackers
Jump to latest
#1ow
oneway_111@yahoo.com

PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96

-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;

-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;

CREATE INDEX small_fk ON my.Large USING btree (small_id);

---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07

postgres[2091]: [92-1] LOG: duration: 185273.262 ms

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: ow (#1)
Re: 7.4 - FK constraint performance

On Wed, 11 Feb 2004, ow wrote:

PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96

-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;

-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;

CREATE INDEX small_fk ON my.Large USING btree (small_id);

---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07

postgres[2091]: [92-1] LOG: duration: 185273.262 ms

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#1)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

What are those column datatypes?

regards, tom lane

#4ow
oneway_111@yahoo.com
In reply to: Stephan Szabo (#2)
Re: 7.4 - FK constraint performance
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

Ran analyze, the result is the same. Here's more info:

1) There's 1 row in "Large" for "small_id" = 239
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

Quick query. Explain shows index scan.

2) There are many rows in "Large" for "small_id" = 1
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 1 FOR UPDATE OF x

Runs for about 3 min. Explain shows table scan.

3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.

4) Domain types used in the example above
my.dint = int
my.dlong = int8
my.dvalue = varchar(15)

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#5Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: ow (#4)
Re: 7.4 - FK constraint performance

On Thu, 12 Feb 2004, ow wrote:

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

Ran analyze, the result is the same. Here's more info:

You also did the alter table to up the statistics target on the column,
right?

3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.

It doesn't plan it as id=239 but as id=$1 and then executes it with
$1=239. The plan data gets reused for other id values if it needs the
same fk action again later in the session.

I'd hoped that upping the statistics target and re-analyzing would make it
choose an index scan for the case where it doesn't know what constant is
going to be used. Hmm, what is the estimated cost difference and real
time difference on id=1 between seqscan and index scan (explain analyze
output with and without enable_seqscan=off should show you).

#6ow
oneway_111@yahoo.com
In reply to: Stephan Szabo (#5)
Re: 7.4 - FK constraint performance
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

You also did the alter table to up the statistics target on the column,
right?

Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#1)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min.

I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?

prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(239);

When I try it I see an indexscan plan, but maybe there's some aspect of
your setup that's causing problems.

regards, tom lane

#8Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: ow (#6)
Re: 7.4 - FK constraint performance

On Thu, 12 Feb 2004, ow wrote:

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

You also did the alter table to up the statistics target on the column,
right?

Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?

Basically, run something like:
-- 1000 is just an arbitrary choice, but we can lower it later if this
-- works.
ALTER TABLE my.Large ALTER COLUMN small_id SET STATISTICS 1000;
ANALYZE my.Large;

This increases the target for just the column in question which means it
should have a better idea of the distribution and may make it make a
better guess if you've got a somewhat uneven distribution.

#9ow
oneway_111@yahoo.com
In reply to: Tom Lane (#7)
Re: 7.4 - FK constraint performance
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min.

I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?

1) SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = 201 FOR UPDATE OF x;

QUERY PLAN
Index Scan using small_fk on large x (cost=0.00..6.01 rows=1 width=6) (actual
time=0.251..0.251 rows=0 loops=1)
Index Cond: ((small_id)::integer = 201)
Total runtime: 0.338 ms

2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(201);

QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#9)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think it must be using a seqscan for the foreign key check query.

2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(201);

QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms

Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?

regards, tom lane

#11ow
oneway_111@yahoo.com
In reply to: Tom Lane (#10)
Re: 7.4 - FK constraint performance
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?

regards, tom lane

schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#11)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573

According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??

The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.

If this estimate is not right, perhaps you could give us a more accurate
view of the column statistics?

regards, tom lane

#13ow
oneway_111@yahoo.com
In reply to: Tom Lane (#12)
Re: 7.4 - FK constraint performance
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??

Let's say we have City (small) and Person (large) tables. A new city was added
(mistakenly) with id=239, it does not have any "persons" assigned yet. Hence,
we want to remove the wrong "city" record.

In any case, one can't remove record from "small" unless there are NO records
in "large", RI will not allow it. The initial problem was that I tried to
delete a record from "small" and it was taking about 3 min to do that.

The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#13)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Can't see how we optimize your case without pessimizing more-common cases.
Sorry.

regards, tom lane

#15ow
oneway_111@yahoo.com
In reply to: Tom Lane (#14)
Re: 7.4 - FK constraint performance
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Can't see how we optimize your case without pessimizing more-common cases.

My case appears to be pretty common, i.e. 1 small and 1 large table with RI
constraint between them. In order to delete a record from the small table, the
large table must not have records that are dependent on the deleted row.

I think other RDBMSs simply use preset value instead of partial table scan when
there's not enough stat info. Might be a better way.

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#16Rod Taylor
rbt@rbt.ca
In reply to: Tom Lane (#14)
Re: 7.4 - FK constraint performance

On Thu, 2004-02-12 at 20:10, Tom Lane wrote:

ow <oneway_111@yahoo.com> writes:

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Can't see how we optimize your case without pessimizing more-common cases.
Sorry.

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rod Taylor (#16)
Re: 7.4 - FK constraint performance

Rod Taylor <rbt@rbt.ca> writes:

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: ow (#15)
Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Can't see how we optimize your case without pessimizing more-common cases.

I think other RDBMSs simply use preset value instead of partial table
scan when there's not enough stat info. Might be a better way.

The problem here cannot be described as "not enough stat info". The
available stats are complete and they point very strongly to the
conclusion that searches in the large table should be seqscans.
To do otherwise would be folly in general, even if it happens to
be the correct thing in this particular example.

regards, tom lane

#19Richard Huxton
dev@archonet.com
In reply to: Tom Lane (#17)
Re: 7.4 - FK constraint performance

On Friday 13 February 2004 04:25, Tom Lane wrote:

Rod Taylor <rbt@rbt.ca> writes:

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.

In this precise example, could you not:
1. Check index for value
2. If found, seq-scan

Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there
--
Richard Huxton
Archonet Ltd

#20Rod Taylor
rbt@rbt.ca
In reply to: Richard Huxton (#19)
Re: 7.4 - FK constraint performance

In this precise example, could you not:
1. Check index for value
2. If found, seq-scan

Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there

Most of the queries are going to be for the other values (in which case
you've wasted an index scan) which is minor, but in the event there is a
single 239 you're still taking a big hit.

That is an awful lot of work to handle the non-existant case only.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

#21Rod Taylor
rbt@rbt.ca
In reply to: Tom Lane (#17)
#22Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#17)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#22)
#24Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#24)
#26Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#25)
#27Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Stephan Szabo (#26)
#28news.postgresql.org
jlim@natsoft.com.my
In reply to: ow (#1)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: news.postgresql.org (#28)
#30elein
elein@varlena.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: elein (#30)
#32elein
elein@varlena.com
In reply to: Tom Lane (#31)
#33scott.marlowe
scott.marlowe@ihs.com
In reply to: Tom Lane (#31)
#34Terry Fielder
terry@ashtonwoodshomes.com
In reply to: scott.marlowe (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: scott.marlowe (#33)
#36elein
elein@varlena.com
In reply to: scott.marlowe (#33)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: elein (#36)
#38Richard Huxton
dev@archonet.com
In reply to: Tom Lane (#35)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Huxton (#38)
#40Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Stephan Szabo (#27)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#40)
#42Jeremy Smith
jer@highboard.com
In reply to: Richard Huxton (#38)
#43Jeremy Smith
jer@highboard.com
In reply to: Jeremy Smith (#42)
#44scott.marlowe
scott.marlowe@ihs.com
In reply to: Jeremy Smith (#43)
#45elein
elein@varlena.com
In reply to: Tom Lane (#37)
#46Jeremy Smith
jer@highboard.com
In reply to: scott.marlowe (#44)
#47scott.marlowe
scott.marlowe@ihs.com
In reply to: Jeremy Smith (#46)
#48Jeremy Smith
jer@highboard.com
In reply to: scott.marlowe (#47)
#49scott.marlowe
scott.marlowe@ihs.com
In reply to: Jeremy Smith (#48)
#50Rod Taylor
rbt@rbt.ca
In reply to: scott.marlowe (#47)
#51scott.marlowe
scott.marlowe@ihs.com
In reply to: Rod Taylor (#50)
#52Rod Taylor
rbt@rbt.ca
In reply to: scott.marlowe (#51)
#53scott.marlowe
scott.marlowe@ihs.com
In reply to: Rod Taylor (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: elein (#45)
#55elein
elein@varlena.com
In reply to: Tom Lane (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: elein (#55)