7.4 - FK constraint performance

Started by owover 22 years ago56 messageshackers

oneway_111@yahoo.com

over 22 years ago

PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96

-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;

-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;

CREATE INDEX small_fk ON my.Large USING btree (small_id);

---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07

postgres[2091]: [92-1] LOG: duration: 185273.262 ms

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: ow (#1)

Re: 7.4 - FK constraint performance

On Wed, 11 Feb 2004, ow wrote:

PostgreSQL 7.4 on i386-linux-gnu, compiled by GCC 2.96

-- about 10 records
CREATE TABLE my.Small
(
id my.dint NOT NULL,
code my.dvalue NOT NULL,
CONSTRAINT pk_1 PRIMARY KEY (id),
) WITHOUT OIDS;

-- about 80M rows
CREATE TABLE my.Large
(
id my.dlong NOT NULL,
small_id my.dint NOT NULL,
value my.value,
CONSTRAINT pk_2 PRIMARY KEY (id),
CONSTRAINT fk_2 FOREIGN KEY (small_id) REFERENCES my.small (id) ON UPDATE
RESTRICT ON DELETE RESTRICT,
) WITHOUT OIDS;

CREATE INDEX small_fk ON my.Large USING btree (small_id);

---------------------------------------------
The fowllowing queiries run in less than 40 ms.
1) select 1 from Large where small_id = 239
2) SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

3) delete from Small where id = 239
Feb 11 18:07:36 srv07 postgres[2091]: [91-1] LOG: statement: SELECT 1 FROM
ONLY "my"."Large" x WHERE "small_id" = $1 FOR UPDATE OF x Feb 11 18:10:41 srv07

postgres[2091]: [92-1] LOG: duration: 185273.262 ms

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#1)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min. Why is it taking so long if
practically the same select query (see (2)) is running very quickly. Anything
that can be done to fix it?

What are those column datatypes?

regards, tom lane

oneway_111@yahoo.com

over 22 years ago

In reply to: Stephan Szabo (#2)

Re: 7.4 - FK constraint performance

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

Ran analyze, the result is the same. Here's more info:

1) There's 1 row in "Large" for "small_id" = 239
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 239 FOR UPDATE OF x

Quick query. Explain shows index scan.

2) There are many rows in "Large" for "small_id" = 1
SELECT 1 FROM ONLY "my"."Large" x WHERE "small_id" = 1 FOR UPDATE OF x

Runs for about 3 min. Explain shows table scan.

3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.

4) Domain types used in the example above
my.dint = int
my.dlong = int8
my.dvalue = varchar(15)

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: ow (#4)

Re: 7.4 - FK constraint performance

On Thu, 12 Feb 2004, ow wrote:

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
Hmm, I'd wonder if maybe it's choosing a sequential scan in the second
case? As a random question, does increasing the statistics target on
Large.small_id and re-analyzing change its behavior?

Ran analyze, the result is the same. Here's more info:

You also did the alter table to up the statistics target on the column,
right?

3) delete from Small where id = 239
Runs for about 3 min. It does appear that table scan is used for FK
verification. But why? Am deleting "id = 239" not "id = 1" and the query in
(1) runs very quickly. Had suspicion that wrong id is passed during FK
verification but FK constraint DOES work.

It doesn't plan it as id=239 but as id=$1 and then executes it with
$1=239. The plan data gets reused for other id values if it needs the
same fk action again later in the session.

I'd hoped that upping the statistics target and re-analyzing would make it
choose an index scan for the case where it doesn't know what constant is
going to be used. Hmm, what is the estimated cost difference and real
time difference on id=1 between seqscan and index scan (explain analyze
output with and without enable_seqscan=off should show you).

oneway_111@yahoo.com

over 22 years ago

In reply to: Stephan Szabo (#5)

Re: 7.4 - FK constraint performance

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:

You also did the alter table to up the statistics target on the column,
right?

Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#1)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min.

I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?

prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(239);

When I try it I see an indexscan plan, but maybe there's some aspect of
your setup that's causing problems.

regards, tom lane

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: ow (#6)

Re: 7.4 - FK constraint performance

On Thu, 12 Feb 2004, ow wrote:

--- Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
You also did the alter table to up the statistics target on the column,
right?

Not really. I did not change the the default stats settings in the
postgresql.conf. Not sure what needs to be changed, can you clarify?

Basically, run something like:
-- 1000 is just an arbitrary choice, but we can lower it later if this
-- works.
ALTER TABLE my.Large ALTER COLUMN small_id SET STATISTICS 1000;
ANALYZE my.Large;

This increases the target for just the column in question which means it
should have a better idea of the distribution and may make it make a
better guess if you've got a somewhat uneven distribution.

oneway_111@yahoo.com

over 22 years ago

In reply to: Tom Lane (#7)

Re: 7.4 - FK constraint performance

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

ow <oneway_111@yahoo.com> writes:

When I try to delete record, it takes > 3 min.

I think it must be using a seqscan for the foreign key check query.
Could you try this and show the results?

1) SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = 201 FOR UPDATE OF x;

QUERY PLAN
Index Scan using small_fk on large x (cost=0.00..6.01 rows=1 width=6) (actual
time=0.251..0.251 rows=0 loops=1)
Index Cond: ((small_id)::integer = 201)
Total runtime: 0.338 ms

2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(201);

QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#10

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#9)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think it must be using a seqscan for the foreign key check query.

2) prepare foo(my.dint) as
SELECT 1 FROM ONLY "my"."large" x WHERE "small_id" = $1 FOR UPDATE OF x;

explain analyze execute foo(201);

QUERY PLAN
Seq Scan on large x (cost=0.00..1787052.30 rows=7893843 width=6) (actual
time=210566.301..210566.301 rows=0 loops=1)
Filter: ((small_id)::integer = ($1)::integer)
Total runtime: 210566.411 ms

Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?

regards, tom lane

#11

oneway_111@yahoo.com

over 22 years ago

In reply to: Tom Lane (#10)

Re: 7.4 - FK constraint performance

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, there's the smoking gun all right. Why does it think there are
going to be 7893843 matching rows!? Could we see the pg_stats row for
the large.small_id column?

regards, tom lane

schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#12

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#11)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

schemaname tablename attname null_frac avg_width n_distinct most_common_vals
most_common_freqs histogram_bounds correlation
my large small_id 0 4 10 {7,3,5,1,4,2,8,10,6,9}
{0.108667,0.105,0.104333,0.101333,0.100667,0.0983333,0.0983333,0.0983333,0.094,0.091}
0.0597573

According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??

The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.

If this estimate is not right, perhaps you could give us a more accurate
view of the column statistics?

regards, tom lane

#13

oneway_111@yahoo.com

over 22 years ago

In reply to: Tom Lane (#12)

Re: 7.4 - FK constraint performance

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

According to this entry, your small_id column only contains the ten
values 1..10, roughly evenly distributed. So why are you probing for
239??

Let's say we have City (small) and Person (large) tables. A new city was added
(mistakenly) with id=239, it does not have any "persons" assigned yet. Hence,
we want to remove the wrong "city" record.

In any case, one can't remove record from "small" unless there are NO records
in "large", RI will not allow it. The initial problem was that I tried to
delete a record from "small" and it was taking about 3 min to do that.

The planner is certainly going to estimate a probe for an unspecified
value as retrieving 10% of the table, and under that assumption it's
quite right to use a seqscan.

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#14

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#13)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Can't see how we optimize your case without pessimizing more-common cases.
Sorry.

regards, tom lane

#15

oneway_111@yahoo.com

over 22 years ago

In reply to: Tom Lane (#14)

Re: 7.4 - FK constraint performance

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Can't see how we optimize your case without pessimizing more-common cases.

My case appears to be pretty common, i.e. 1 small and 1 large table with RI
constraint between them. In order to delete a record from the small table, the
large table must not have records that are dependent on the deleted row.

I think other RDBMSs simply use preset value instead of partial table scan when
there's not enough stat info. Might be a better way.

Thanks

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

#16

Rod Taylor

rbt@rbt.ca

over 22 years ago

In reply to: Tom Lane (#14)

Re: 7.4 - FK constraint performance

On Thu, 2004-02-12 at 20:10, Tom Lane wrote:

ow <oneway_111@yahoo.com> writes:

Sounds pretty bad for my case. Any way to avoid the 10% scan?

Can't see how we optimize your case without pessimizing more-common cases.
Sorry.

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

#17

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Rod Taylor (#16)

Re: 7.4 - FK constraint performance

Rod Taylor <rbt@rbt.ca> writes:

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: ow (#15)

Re: 7.4 - FK constraint performance

ow <oneway_111@yahoo.com> writes:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can't see how we optimize your case without pessimizing more-common cases.

I think other RDBMSs simply use preset value instead of partial table
scan when there's not enough stat info. Might be a better way.

The problem here cannot be described as "not enough stat info". The
available stats are complete and they point very strongly to the
conclusion that searches in the large table should be seqscans.
To do otherwise would be folly in general, even if it happens to
be the correct thing in this particular example.

regards, tom lane

#19

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Tom Lane (#17)

Re: 7.4 - FK constraint performance

On Friday 13 February 2004 04:25, Tom Lane wrote:

Rod Taylor <rbt@rbt.ca> writes:

Statistics say there are 10 values. Statistics list the 10 most common
values (all of them). Given this, would it not be reasonable to assume
that 239 is a recent addition (if there at all) to the table and not
very common?

We don't know that it's 239 when we make the plan. In order to know
that, we'd have to abandon caching of RI check query plans and re-plan
for each row. That strikes me as inevitably a losing proposition.

In this precise example, could you not:
1. Check index for value
2. If found, seq-scan

Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there
--
Richard Huxton
Archonet Ltd

#20

Rod Taylor

rbt@rbt.ca

over 22 years ago

In reply to: Richard Huxton (#19)

Re: 7.4 - FK constraint performance

In this precise example, could you not:
1. Check index for value
2. If found, seq-scan

Of course that's only going to be a sensible thing to do if you're expecting
one of two results:
1. Value not there
2. Lengthy seq-scan if it is there

Most of the queries are going to be for the other values (in which case
you've wasted an index scan) which is minor, but in the event there is a
single 239 you're still taking a big hit.

That is an awful lot of work to handle the non-existant case only.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

#21

Rod Taylor

rbt@rbt.ca

over 22 years ago

In reply to: Tom Lane (#17)

#22

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Tom Lane (#17)

#23

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Stephan Szabo (#22)

#24

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Tom Lane (#23)

#25

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Stephan Szabo (#24)

#26

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Tom Lane (#25)

#27

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Stephan Szabo (#26)

#28

news.postgresql.org

jlim@natsoft.com.my

over 22 years ago

In reply to: ow (#1)

#29

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: news.postgresql.org (#28)

#30

elein

elein@varlena.com

over 22 years ago

In reply to: Tom Lane (#29)

#31

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: elein (#30)

#32

elein

elein@varlena.com

over 22 years ago

In reply to: Tom Lane (#31)

#33

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Tom Lane (#31)

#34

Terry Fielder

terry@ashtonwoodshomes.com

over 22 years ago

In reply to: scott.marlowe (#33)

#35

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: scott.marlowe (#33)

#36

elein

elein@varlena.com

over 22 years ago

In reply to: scott.marlowe (#33)

#37

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: elein (#36)

#38

Richard Huxton

dev@archonet.com

over 22 years ago

In reply to: Tom Lane (#35)

#39

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Richard Huxton (#38)

#40

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Stephan Szabo (#27)

#41

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Stephan Szabo (#40)

#42

Jeremy Smith

jer@highboard.com

over 22 years ago

In reply to: Richard Huxton (#38)

#43

Jeremy Smith

jer@highboard.com

over 22 years ago

In reply to: Jeremy Smith (#42)

#44

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Jeremy Smith (#43)

#45

elein

elein@varlena.com

over 22 years ago

In reply to: Tom Lane (#37)

#46

Jeremy Smith

jer@highboard.com

over 22 years ago

In reply to: scott.marlowe (#44)

#47

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Jeremy Smith (#46)

#48

Jeremy Smith

jer@highboard.com

over 22 years ago

In reply to: scott.marlowe (#47)

#49

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Jeremy Smith (#48)

#50

Rod Taylor

rbt@rbt.ca

over 22 years ago

In reply to: scott.marlowe (#47)

#51

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Rod Taylor (#50)

#52

Rod Taylor

rbt@rbt.ca

over 22 years ago

In reply to: scott.marlowe (#51)

#53

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Rod Taylor (#52)

#54

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: elein (#45)

#55

elein

elein@varlena.com

over 22 years ago

In reply to: Tom Lane (#54)

#56

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: elein (#55)