Indexes not always used after inserts/updates/vacuum analyze

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

"Michael G. Martin" <michael@vpmonline.com> writes:

Here is what I would expect which usually happens:

explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129)

Here is one that fails:

explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129)

The planner thinks that there are 438K occurrences of 'ELTE' in your
table. If that's true, a sequential scan is not obviously the wrong
choice. How many are there, in reality? What do you get from

select * from pg_stats where tablename = 'symbol_data' and attname =
'symbol_name';

Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)

regards, tom lane

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

Hi Tom,

Here is what is actually there:

select count(*) from symbol_data where symbol_name='ELTE';
count
-------
687

Here is the pg_stat query:

select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

Interesting eh? EBALX also does a full scan--all others in the above
list get an index scan.

Here's the variable stuff--I attached the verbose outputs.
set enable_seqscan = on;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:

Seq Scan on symbol_data (cost=0.00..707611.68 rows=355958 width=129)

set enable_seqscan = off;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data (cost=0.00..1420899.51
rows=355958 width=129)

Index scan appears slower in explain, but the rows value is weird.

Thanks,
Michael

Tom Lane wrote:

Show quoted text

"Michael G. Martin" <michael@vpmonline.com> writes:

Here is what I would expect which usually happens:

explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129)

Here is one that fails:

explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129)

The planner thinks that there are 438K occurrences of 'ELTE' in your
table. If that's true, a sequential scan is not obviously the wrong
choice. How many are there, in reality? What do you get from

select * from pg_stats where tablename = 'symbol_data' and attname =
'symbol_name';

Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)

regards, tom lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Michael G. Martin (#3)

Re: Indexes not always used after inserts/updates/vacuum analyze

"Michael G. Martin" <michael@vpmonline.com> writes:

Here is what is actually there:

select count(*) from symbol_data where symbol_name='ELTE';
687

Hmm. Do you have reason to think that that was also true when you last
did VACUUM ANALYZE or VACUUM?

Here is the pg_stat query:
select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%. Does that seem plausible to you? If the
sample was accurate then I'd agree with the planner's choices. It'd
seem that either your table contents are changing drastically (in which
case more-frequent ANALYZEs may be the answer), or you had the bad luck
to get a very unrepresentative sample, or there's some bug in the
statistical calculations.

regards, tom lane

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

yes. each symbol_name only gets one row added and maybe a few updated
each market day.
This is interesting too. Planner thinks 128 rows on this symbol, GE,
yet there are really 5595. Not as off as ELTE, but a large factor. at
least the index get hit here.

explain select * from symbol_data where symbol_name='GE';
NOTICE: QUERY PLAN:

Index Scan using symbol_data_pkey on symbol_data (cost=0.00..513.09
rows=128 width=129)

EXPLAIN
vpm=> select count(*) from symbol_data where symbol_name='GE';
count
-------
5595

Tom Lane wrote:

Show quoted text

"Michael G. Martin" <michael@vpmonline.com> writes:

Here is what is actually there:

select count(*) from symbol_data where symbol_name='ELTE';
687

Hmm. Do you have reason to think that that was also true when you last
did VACUUM ANALYZE or VACUUM?

Here is the pg_stat query:
select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%. Does that seem plausible to you? If the
sample was accurate then I'd agree with the planner's choices. It'd
seem that either your table contents are changing drastically (in which
case more-frequent ANALYZEs may be the answer), or you had the bad luck
to get a very unrepresentative sample, or there's some bug in the
statistical calculations.

regards, tom lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Tom Lane (#4)

Re: Indexes not always used after inserts/updates/vacuum analyze

I said:

symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%.

Argh, make that 1.8% and 1.7%.

That's still orders of magnitude away from what you say the correct
frequency is, however: 687 out of 20+ million. I'd like to think that
the statistical sampling would be unlikely to make such a large error.

regards, tom lane

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

Sorry, I missed your bottom part before I replied last.

The table breakdown consists of about 8000 symbol_names with at most
5000 rows of data for each symbol ( stock market history ).

So, those sample percents seem huge. The most any symbol would have
would be about 5000 / (8000*5000) = .0125%

--Michael

Tom Lane wrote:

Show quoted text

"Michael G. Martin" <michael@vpmonline.com> writes:

Here is what is actually there:

select count(*) from symbol_data where symbol_name='ELTE';
687

Hmm. Do you have reason to think that that was also true when you last
did VACUUM ANALYZE or VACUUM?

Here is the pg_stat query:
select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%. Does that seem plausible to you? If the
sample was accurate then I'd agree with the planner's choices. It'd
seem that either your table contents are changing drastically (in which
case more-frequent ANALYZEs may be the answer), or you had the bad luck
to get a very unrepresentative sample, or there's some bug in the
statistical calculations.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

Heh--i was gonna ask why the strange percent representation in the stats
table.

I just ran a vacuum analyze with the specific column. Still get the
same explain plan:

Seq Scan on symbol_data (cost=0.00..709962.90 rows=369782 width=129)

--Michael

Tom Lane wrote:

Show quoted text

I said:

symbol_data | symbol_name | 0 | 7 | 152988 |
{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} |
{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)

What this says is that in the last ANALYZE, EBALX accounted for 18% of
the sample, and ELTE for 17%.

Argh, make that 1.8% and 1.7%.

That's still orders of magnitude away from what you say the correct
frequency is, however: 687 out of 20+ million. I'd like to think that
the statistical sampling would be unlikely to make such a large error.

regards, tom lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Michael G. Martin (#8)

Re: Indexes not always used after inserts/updates/vacuum analyze

"Michael G. Martin" <michael@vpmonline.com> writes:

I just ran a vacuum analyze with the specific column. Still get the
same explain plan:

Did the pg_stats data change noticeably?

ANALYZE is a statistical sampling process in 7.2, so I'd expect the
results to move around somewhat each time you repeat it. But if it
changes a lot then we have a problem.

You could also try

ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n

for larger values of n (10 is the default) and then re-ANALYZE
to see if the stats get any more accurate. The default of 10
was more or less picked out of the air ... perhaps it's too small.

regards, tom lane

#10

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

Here's the new stats since the vacuum on that column--quite a few changes.

select * from pg_stats where tablename = 'symbol_data' and attname
='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------+------------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 150712 |
{EBALX,ELTE,SRP,KMG,MKC,AEN,BAC,BDX,BKF,BRT} |
{0.0233333,0.018,0.00266667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {A,BRN,DPAC,FUTR,INTC,MDSN,OCA,RAA,SSYS,USTB,_^^VPM} | 0.112971
(1 row)

I'll alter and play with the table tomorrow and let you know what I find.

-Michael

Tom Lane wrote:

Show quoted text

"Michael G. Martin" <michael@vpmonline.com> writes:

I just ran a vacuum analyze with the specific column. Still get the
same explain plan:

Did the pg_stats data change noticeably?

ANALYZE is a statistical sampling process in 7.2, so I'd expect the
results to move around somewhat each time you repeat it. But if it
changes a lot then we have a problem.

You could also try

ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n

for larger values of n (10 is the default) and then re-ANALYZE
to see if the stats get any more accurate. The default of 10
was more or less picked out of the air ... perhaps it's too small.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#11

max@suse.de

about 24 years ago

In reply to: Tom Lane (#2)

Re: Indexes not always used after inserts/updates/vacuum

On Wed, 27 Feb 2002 at 22:57, Tom Lane wrote:

Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)

I've just found a case where forcing indexscans results in much higher
speed. On the the 350000 rows table mentioned in my other post after a
"VACUUM FULL ANALYZE":

max=# set enable_seqscan to false;
max=# EXPLAIN analyze
SELECT count(foo.id) FROM foo, bar
WHERE foo.id = bar.ref2foo;
Aggregate (cost=27513.65..27513.65 rows=1 width=8)
(actual time=652.38..652.38 rows=1 loops=1)
-> Merge Join
(cost=0.00..27417.57 rows=38431 width=8)
(actual time=0.06..603.02 rows=38431 loops=1)
-> Index Scan using foo_pkey on foo
(cost=0.00..25153.18 rows=352072 width=4)
(actual time=0.03..157.57 rows=38432 loops=1)
-> Index Scan using idx_bar_ref2foo on bar
(cost=0.00..807.74 rows=38431 width=4)
(actual time=0.02..170.25 rows=38431 loops=1)
Total runtime: 652.58 msec
^^^^^^^^^^^
max=# set enable_seqscan to true;
max=# EXPLAIN analyze
SELECT count(foo.id) FROM foo, bar
WHERE foo.id = bar.ref2foo;

Aggregate
(cost=18560.65..18560.65 rows=1 width=8)
(actual time=4951.57..4951.57 rows=1 loops=1)
-> Hash Join
(cost=911.39..18464.58 rows=38431 width=8)
(actual time=653.26..4905.37 rows=38431 loops=1)
-> Seq Scan on foo
(cost=0.00..9251.72 rows=352072 width=4)
(actual time=0.02..769.60 rows=352072 loops=1)
-> Hash
(cost=683.31..683.31 rows=38431 width=4)
(actual time=140.60..140.60 rows=0 loops=1)
-> Seq Scan on bar
(cost=0.00..683.31 rows=38431 width=4)
(actual time=0.02..78.57 rows=38431 loops=1)
Total runtime: 4951.70 msec
^^^^^^^^^^^^

I've reproduced that several times. Even on a newly started postmaster
the query takes less than 2.5 seconds with seqscans swited off.

cu
Reinhard

#12

michael@vpmonline.com

about 24 years ago

In reply to: Michael G. Martin (#1)

Re: Indexes not always used after inserts/updates/vacuum analyze

Ok, so this morning after the automated nightly vacuum -z -v on the
database, ELTE no longer appears in the pg_stats table, and the index is
picked no problem. The table data has not changed since last eve.

However, now there is a new symbol which is behaving the same way--I.
This symbol was just loaded into the database yesterday. There are
officially 4108 rows in the symbol_data table where symbol_name='I'. I
bumped the STATISTICS value up to 100, re-analyzed, but the pg_stats
table still shows I first on the list with a value of 0.0182--didn't
change much from the original STATISTICS value of 10.

Here are the explain analyzes:

set enable_seqscan = on;
explain analyze select * from symbol_data where symbol_name='I' order by
date;
NOTICE: QUERY PLAN:

Sort (cost=811813.33..811813.33 rows=373904 width=129) (actual
time=93423.45..93427.02 rows=4108 loops=1)
-> Seq Scan on symbol_data (cost=0.00..709994.20 rows=373904
width=129) (actual time=92483.55..93399.60 rows=4108 loops=1)
Total runtime: 93431.50 msec

set enable_seqscan = off;
SET VARIABLE
vpm=> explain analyze select * from symbol_data where symbol_name='I'
order by date;
NOTICE: QUERY PLAN:

Sort (cost=1584564.49..1584564.49 rows=373904 width=129) (actual
time=129.38..133.01 rows=4108 loops=1)
-> Index Scan using symbol_data_pkey on symbol_data
(cost=0.00..1482745.36 rows=373904 width=129) (actual time=21.54..105.46
rows=4108 loops=1)
Total runtime: 137.55 msec

Even though the optimizer thinks the index will cost more, it does pick
it and use it with the performance expected when enable_seqscan = off;

-Michael

Tom Lane wrote:

Show quoted text

"Michael G. Martin" <michael@vpmonline.com> writes:

I just ran a vacuum analyze with the specific column. Still get the
same explain plan:

Did the pg_stats data change noticeably?

ANALYZE is a statistical sampling process in 7.2, so I'd expect the
results to move around somewhat each time you repeat it. But if it
changes a lot then we have a problem.

You could also try

ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n

for larger values of n (10 is the default) and then re-ANALYZE
to see if the stats get any more accurate. The default of 10
was more or less picked out of the air ... perhaps it's too small.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#13

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Reinhard Max (#11)

Re: Indexes not always used after inserts/updates/vacuum analyze

Reinhard Max <max@suse.de> writes:

I've just found a case where forcing indexscans results in much higher
speed.

-> Index Scan using foo_pkey on foo
(cost=0.00..25153.18 rows=352072 width=4)
(actual time=0.03..157.57 rows=38432 loops=1)

The major estimation error is evidently in this indexscan. What
statistics does pg_stats show for this table?

regards, tom lane

#14

max@suse.de

about 24 years ago

In reply to: Tom Lane (#13)

Re: Indexes not always used after inserts/updates/vacuum

On Thu, 28 Feb 2002 at 09:51, Tom Lane wrote:

Reinhard Max <max@suse.de> writes:

I've just found a case where forcing indexscans results in much higher
speed.

-> Index Scan using foo_pkey on foo
(cost=0.00..25153.18 rows=352072 width=4)
(actual time=0.03..157.57 rows=38432 loops=1)

The major estimation error is evidently in this indexscan. What
statistics does pg_stats show for this table?

See attached file.

BTW, I've just done the same test on PostgreSQL 7.1 and got similar
results.

cu
Reinhard

#15

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Reinhard Max (#14)

Re: Indexes not always used after inserts/updates/vacuum analyze

Reinhard Max <max@suse.de> writes:

The major estimation error is evidently in this indexscan. What
statistics does pg_stats show for this table?

See attached file.

Okay. It looks like foo.id has a pretty strong but not perfect
descending order (the correlation statistic is -0.563276). The
planner is evidently not rating that effect strongly enough.

If you look in cost_index (see approx. lines 270-340 in
src/backend/optimizer/path/costsize.c) you'll see that it computes
access cost estimates for both the perfectly sequential case and
the perfectly uncorrelated case, and then tries to interpolate
between them. I have reasonable faith in both of the endpoint
estimation methods, but very little in the interpolation equation ---
it was chosen on the spur of the moment and hasn't really been tested.

It might be interesting to replace csquared with just
fabs(indexCorrelation) to see if the results are better. Also, if you
cared to step through the code with a debugger or add some printout
statements, we could learn what the min and max costs are that it's
interpolating between; that'd be interesting to know as well.

regards, tom lane

#16

max@suse.de

about 24 years ago

In reply to: Tom Lane (#15)

Re: Indexes not always used after inserts/updates/vacuum

On Thu, 28 Feb 2002 at 10:15, Tom Lane wrote:

Okay. It looks like foo.id has a pretty strong but not perfect
descending order (the correlation statistic is -0.563276). The
planner is evidently not rating that effect strongly enough.

Yes, that seems to be the reason. When I try

SELECT * into foo2 from foo order by id;
CREATE index foo2_id on foo2(id);
VACUUM ANALYZE foo2;

and repeat the join with foo2 instead of foo, index scans are used
even when seqscans are not forbidden.

[...]
It might be interesting to replace csquared with just
fabs(indexCorrelation) to see if the results are better. Also, if you
cared to step through the code with a debugger or add some printout
statements, we could learn what the min and max costs are that it's
interpolating between; that'd be interesting to know as well.

OK, this is what I've changed:

- csquared = indexCorrelation * indexCorrelation;
+ elog(NOTICE, "min_IO_cost = %f, max_IO_cost = %f, indexCorrelation = %f",
+               min_IO_cost, max_IO_cost, indexCorrelation);
+ csquared = fabs (indexCorrelation);

Are these the addtional values you wanted to see?

These are the results:

max=# EXPLAIN analyze SELECT count(foo.id)
FROM foo, bar WHERE foo.id = bar.ref2foo;

NOTICE: min_IO_cost = 299.000000, max_IO_cost = 1196.000000
indexCorrelation = -1.000000
NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.993322
indexCorrelation = -1.000000
NOTICE: min_IO_cost = 5880.000000, max_IO_cost = 1169154.985307
indexCorrelation = -0.532557
NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.999660
indexCorrelation = -0.532557
NOTICE: QUERY PLAN:

Aggregate (cost=18709.65..18709.65 rows=1 width=8)
(actual time=7229.15..7229.15 rows=1 loops=1)
-> Hash Join (cost=911.39..18613.58 rows=38431 width=8)
(actual time=208.23..7184.68 rows=38431 loops=1)
-> Seq Scan on foo
(cost=0.00..9400.72 rows=352072 width=4)
(actual time=0.02..810.92 rows=352072 loops=1)
-> Hash (cost=683.31..683.31 rows=38431 width=4)
(actual time=149.87..149.87 rows=0 loops=1)
-> Seq Scan on bar
(cost=0.00..683.31 rows=38431 width=4)
(actual time=0.02..83.32 rows=38431 loops=1)
Total runtime: 7229.29 msec

EXPLAIN
max=# EXPLAIN analyze SELECT count(foo2.id)
FROM foo2, bar WHERE foo2.id = bar.ref2foo;

NOTICE: min_IO_cost = 299.000000, max_IO_cost = 1196.000000
indexCorrelation = -1.000000
NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.993322
indexCorrelation = -1.000000
NOTICE: min_IO_cost = 5741.000000, max_IO_cost = 1163366.000920
indexCorrelation = 1.000000
NOTICE: min_IO_cost = 1.000000, max_IO_cost = 3.999652
indexCorrelation = 1.000000
NOTICE: QUERY PLAN:

Aggregate (cost=12748.26..12748.26 rows=1 width=8)
(actual time=687.08..687.08 rows=1 loops=1)
-> Merge Join (cost=0.00..12652.18 rows=38431 width=8)
(actual time=0.44..633.53 rows=38431 loops=1)
-> Index Scan using foo2_pkey on foo2
(cost=0.00..10387.79 rows=352072 width=4)
(actual time=0.26..174.32 rows=38432 loops=1)
-> Index Scan using idx_bar_ref2foo on bar
(cost=0.00..807.74 rows=38431 width=4)
(actual time=0.17..180.34 rows=38431 loops=1)
Total runtime: 687.31 msec

#17

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Reinhard Max (#16)

Re: Indexes not always used after inserts/updates/vacuum analyze

Are these the addtional values you wanted to see?

Yes, but I just noticed something else strange:

-> Index Scan using foo2_pkey on foo2
(cost=0.00..10387.79 rows=352072 width=4)
(actual time=0.26..174.32 rows=38432 loops=1)

The actual rows read from this indexscan seem to be many fewer than
the number of rows in the table. What are the ranges of the id values
in tables foo and bar? I'm wondering if the merge could have stopped
far short of the end of the foo table; if so, *that* is the effect that
we are failing to model accurately.

regards, tom lane

#18

max@suse.de

about 24 years ago

In reply to: Tom Lane (#17)

Re: Indexes not always used after inserts/updates/vacuum

Hi,

On Thu, 28 Feb 2002 at 16:10, Tom Lane wrote:

-> Index Scan using foo2_pkey on foo2
(cost=0.00..10387.79 rows=352072 width=4)
(actual time=0.26..174.32 rows=38432 loops=1)

The actual rows read from this indexscan seem to be many fewer than
the number of rows in the table. What are the ranges of the id
values in tables foo and bar? I'm wondering if the merge could have
stopped far short of the end of the foo table; if so, *that* is the
effect that we are failing to model accurately.

Good guess :)

max=# SELECT 'bar' AS tablename, min(ref2foo), max(ref2foo),
count(ref2foo) FROM bar
UNION SELECT 'foo', min(id), max(id), count(id) from foo;

tablename | min | max | count
-----------+----------+----------+--------
bar | 10000010 | 10049999 | 38431
foo | 10000010 | 10423442 | 352072
(2 rows)

I'll tell my colleague (it's his test database, after all) that he
should take more realistic test data before complaining about bad
performance...

cu
Reinhard

#19

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Reinhard Max (#18)

Re: Indexes not always used after inserts/updates/vacuum analyze

Reinhard Max <max@suse.de> writes:

The actual rows read from this indexscan seem to be many fewer than
the number of rows in the table. What are the ranges of the id
values in tables foo and bar? I'm wondering if the merge could have
stopped far short of the end of the foo table; if so, *that* is the
effect that we are failing to model accurately.

Good guess :)

I'll tell my colleague (it's his test database, after all) that he
should take more realistic test data before complaining about bad
performance...

Actually, is it unrealistic test data? After thinking about it awhile,
I concluded that this is an effect the planner could and should model.
We have statistics that will tell us the maximum values of both
variables (at least in common cases), so it's not hard to estimate which
input stream will be exhausted first and how much of the other one will
actually be read. This could make a big difference in the cost of an
indexscan-based merge.

I have committed changes for 7.3 that do this. It's probably too big a
change to risk back-patching for 7.2.1, but if you care to experiment
with CVS tip then you could try it out.

regards, tom lane

#20