7.4 - FK constraint performance
PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96
-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;
-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;
CREATE INDEX small_fk ON my.Large USING btree (small_id);
---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x
3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07
postgres[2091]: [92-1] LOG: duration: 185273.262 ms
When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
On Wed, 11 Feb 2004, ow wrote:
PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96
-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;CREATE INDEX small_fk ON my.Large USING btree (small_id);
---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07postgres[2091]: [92-1] LOG: duration: 185273.262 ms
When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?
Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?
ow <oneway_111@yahoo.com> writes:
When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?
What are those column datatypes?
regards, tom lane
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?
Ran analyze, the result is the same. Here's more info:
1) There's 1 row in "Large" for "small_id" = 239
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x
Quick query. Explain shows index scan.
2) There are many rows in "Large" for "small_id" = 1
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 1 FOR UPDATE OF x
Runs for about 3 min. Explain shows table scan.
3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.
4) Domain types used in the example above
my.dint = int
my.dlong = int8
my.dvalue = varchar(15)
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
On Thu, 12 Feb 2004, ow wrote:
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?Ran analyze, the result is the same. Here's more info:
You also did the alter table to up the statistics target on the column,
right?
3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.
It doesn't plan it as id=239 but as id=$1 and then executes it with
$1=239. The plan data gets reused for other id values if it needs the
same fk action again later in the session.
I'd hoped that upping the statistics target and re-analyzing would make it
choose an index scan for the case where it doesn't know what constant is
going to be used. Hmm, what is the estimated cost difference and real
time difference on id=1 between seqscan and index scan (explain analyze
output with and without enable_seqscan=off should show you).
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
You also did the alter table to up the statistics target on the column,
right?
Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes:
When I try to delete record, it takes > 3 min.
I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?
prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;
explain analyze execute foo(239);
When I try it I see an indexscan plan, but maybe there's some aspect of
your setup that's causing problems.
regards, tom lane
On Thu, 12 Feb 2004, ow wrote:
--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:You also did the alter table to up the statistics target on the column,
right?Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?
Basically, run something like:
-- 1000 is just an arbitrary choice, but we can lower it later if this
-- works.
ALTER TABLE my.Large ALTER COLUMN small_id SET STATISTICS 1000;
ANALYZE my.Large;
This increases the target for just the column in question which means it
should have a better idea of the distribution and may make it make a
better guess if you've got a somewhat uneven distribution.
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
ow <oneway_111@yahoo.com> writes:
When I try to delete record, it takes > 3 min.
I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?
1) SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = 201 FOR UPDATE OF x;
QUERY PLAN
Index Scan using small_fk on large x (cost=0.00..6.01 rows=1 width=6) (actual
time=0.251..0.251 rows=0 loops=1)
Index Cond: ((small_id)::integer = 201)
Total runtime: 0.338 ms
2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;
explain analyze execute foo(201);
QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:I think it must be using a seqscan for the foreign key check query.
2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;
explain analyze execute foo(201);
QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms
Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?
regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?regards, tom lane
schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes:
schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573
According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??
The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.
If this estimate is not right, perhaps you could give us a more accurate
view of the column statistics?
regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??
Let's say we have City (small) and Person (large) tables. A new city was added
(mistakenly) with id=239, it does not have any "persons" assigned yet. Hence,
we want to remove the wrong "city" record.
In any case, one can't remove record from "small" unless there are NO records
in "large", RI will not allow it. The initial problem was that I tried to
delete a record from "small" and it was taking about 3 min to do that.
The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.
Sounds pretty bad for my case. Any way to avoid the 10% scan?
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
ow <oneway_111@yahoo.com> writes:
Sounds pretty bad for my case. Any way to avoid the 10% scan?
Can't see how we optimize your case without pessimizing more-common cases.
Sorry.
regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can't see how we optimize your case without pessimizing more-common cases.
My case appears to be pretty common, i.e. 1 small and 1 large table with RI
constraint between them. In order to delete a record from the small table, the
large table must not have records that are dependent on the deleted row.
I think other RDBMSs simply use preset value instead of partial table scan when
there's not enough stat info. Might be a better way.
Thanks
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
On Thu, 2004-02-12 at 20:10, Tom Lane wrote:
ow <oneway_111@yahoo.com> writes:
Sounds pretty bad for my case. Any way to avoid the 10% scan?
Can't see how we optimize your case without pessimizing more-common cases.
Sorry.
Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?
--
Rod Taylor <rbt [at] rbt [dot] ca>
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc
Rod Taylor <rbt@rbt.ca> writes:
Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?
We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.
regards, tom lane
ow <oneway_111@yahoo.com> writes:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:Can't see how we optimize your case without pessimizing more-common cases.
I think other RDBMSs simply use preset value instead of partial table
scan when there's not enough stat info. Might be a better way.
The problem here cannot be described as "not enough stat info". The
available stats are complete and they point very strongly to the
conclusion that searches in the large table should be seqscans.
To do otherwise would be folly in general, even if it happens to
be the correct thing in this particular example.
regards, tom lane
On Friday 13 February 2004 04:25, Tom Lane wrote:
Rod Taylor <rbt@rbt.ca> writes:
Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.
In this precise example, could you not:
1. Check index for value
2. If found, seq-scan
Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there
--
Richard Huxton
Archonet Ltd
In this precise example, could you not:
1. Check index for value
2. If found, seq-scanOf course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there
Most of the queries are going to be for the other values (in which case
you've wasted an index scan) which is minor, but in the event there is a
single 239 you're still taking a big hit.
That is an awful lot of work to handle the non-existant case only.
--
Rod Taylor <rbt [at] rbt [dot] ca>
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc