Bug #474: Index using problem

Started by PostgreSQL Bugs Listover 24 years ago9 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

JoE Olcsak (joe78@freemail.hu) reports a bug with a severity of 4
The lower the number the more severe it is.

Short Description
Index using problem

Long Description
I have a problem when using indexes under PostgreSQL 7.1.1 ...

The problem is :

I have a field of table ... This is an INT4 type field ... and I
created an index for this field like :
create index idx_of_field on cim(utod_cim_id);
... and I executed a simple query ...

explain select * from cim where utod_cim_id=0;

NOTICE: QUERY PLAN:

Seq Scan on cim (cost=0.00..6555.41 rows=253633 width=118)

EXPLAIN

The query optimizer does not use my index..

I'm executed vacuum analyze command for this table but this not helps me ...

Then I created another index for this field:

create index idx_of_field2 on cim(int4(utod_cim_id));

... and I executed another simple query :

explain select * from cim where int4(utod_cim_id)=0;
NOTICE: QUERY PLAN:

Index Scan using idx_of_field2 on cim (cost=0.00..6499.70 rows=2536 width=118)

EXPLAIN

What is wrong in the first case ?

Sample Code

No file was uploaded with this report

#2Andreas Wernitznig
andreas@insilico.com
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #474: Index using problem

The query optimizer uses the index only if enough data are present in the table.
If only a few data are available a Seq Scan performs better and is therefore used.

Further one of the problems (which is hopfully solved in version 7.2) is that the query optimizer used for primary keys/foreign keys is not updated for an uninterrupted database connection.

E.g.:
If have an empty database "Seq Scan" is used in any case. Then I connect to the database, fill in some data, and execute an "VACUUM ANALYZE" (which updates the statistics for the query optimizer).
If I don't disconnect from the database but use the first connection again, the database still uses the (now) slower "seq scan" for "primary/foreign key" checking. In this case the query optimizer statistics are not updated for established connections.

My suggestion and question to the postgresql-programmers is now:
Why don't you skip the automatic index creation for primary keys and let the user decide to create an index,
that should be used in any case, regardless whether the table is full or empty ?

Andreas

On Fri, 5 Oct 2001 03:21:01 -0400 (EDT)
pgsql-bugs@postgresql.org wrote:

Show quoted text

JoE Olcsak (joe78@freemail.hu) reports a bug with a severity of 4
The lower the number the more severe it is.

Short Description
Index using problem

Long Description
I have a problem when using indexes under PostgreSQL 7.1.1 ...

The problem is :

I have a field of table ... This is an INT4 type field ... and I
created an index for this field like :
create index idx_of_field on cim(utod_cim_id);
... and I executed a simple query ...

explain select * from cim where utod_cim_id=0;

NOTICE: QUERY PLAN:

Seq Scan on cim (cost=0.00..6555.41 rows=253633 width=118)

EXPLAIN

The query optimizer does not use my index..

I'm executed vacuum analyze command for this table but this not helps me ...

Then I created another index for this field:

create index idx_of_field2 on cim(int4(utod_cim_id));

... and I executed another simple query :

explain select * from cim where int4(utod_cim_id)=0;
NOTICE: QUERY PLAN:

Index Scan using idx_of_field2 on cim (cost=0.00..6499.70 rows=2536 width=118)

EXPLAIN

What is wrong in the first case ?

Sample Code

No file was uploaded with this report

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #474: Index using problem

pgsql-bugs@postgresql.org writes:

explain select * from cim where utod_cim_id=0;
NOTICE: QUERY PLAN:
Seq Scan on cim (cost=0.00..6555.41 rows=253633 width=118)

The query optimizer does not use my index..

How many rows are there where utod_cim_id=0? The planner guesses that
there are a lot (253633, to be exact) and concludes that using the index
is not profitable for this query.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Wernitznig (#2)
Re: Bug #474: Index using problem

Andreas Wernitznig <andreas@insilico.com> writes:

If I don't disconnect from the database but use the first connection
again, the database still uses the (now) slower "seq scan" for
"primary/foreign key" checking. In this case the query optimizer
statistics are not updated for established connections.

Sure they are --- in my tests, anyway. What did you do *exactly*?

regards, tom lane

#5Andreas Wernitznig
andreas@insilico.com
In reply to: Tom Lane (#4)
Re: Bug #474: Index using problem

1. I created a small database consisting of two connected tables:

create table table1 (
index1 int4 not null,
textfield varchar(1000) not null,
constraint PK_table1 primary key (index1)
);

create table table2 (
index2 int4 not null,
index1 int4 not null,
textfield varchar(1000) not null,
constraint PK_table2 primary key (index2),
constraint FK_table1 foreign key (index1)
references table1 (index1)
on delete restrict on update restrict
);

2. Then I insert 100 rows ($n=1..100) in each of these tables:

insert into table1 VALUES ($n, '123456789');
insert into table2 VALUES ($n, $n, '123456789');

3. then I send a "vacuum analyze" and an "explain select * from table1 where index1 = 543;"
The output is:
NOTICE: QUERY PLAN:
Index Scan using pk_table1 on table1 (cost=0.00..2.01 rows=1 width=16)

4. Then I insert 4900 rows into each of these tables like in step 2.

----------------------------

-- Test A: --
Then I send a "vacuum analyze;"
and "delete from table2;",
and "delete from table1;"
and rerun steps 2 to 4 -> step 4 takes 39 seconds.

-- Test B: --
Then I send "delete from table2;",
and "delete from table1;",
and a "vacuum analyze;"
and rerun steps 2 to 4 -> step 4 takes 81 seconds.

Although the "explain" command tells me that an Index Scan is used, step 4 is much slower in Test B.
For both tests (steps 2-4) I use one connection to the database.
If I quit the connection after step 3 and establish a new connection for step 4 it takes 39 seconds in either cases.
-> Using one connection the optimizer for pk/fk-checking is not updated by a "vacuum analyze".

On Fri, 05 Oct 2001 09:52:20 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andreas Wernitznig <andreas@insilico.com> writes:

If I don't disconnect from the database but use the first connection
again, the database still uses the (now) slower "seq scan" for
"primary/foreign key" checking. In this case the query optimizer
statistics are not updated for established connections.

Sure they are --- in my tests, anyway. What did you do *exactly*?

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Wernitznig (#5)
Re: Bug #474: Index using problem

Andreas Wernitznig <andreas@insilico.com> writes:

-> Using one connection the optimizer for pk/fk-checking is not
updated by a "vacuum analyze".

Oh, I misunderstood you the first time: I thought you were saying that
*other* backends couldn't see the results of the VACUUM.

The reason for this behavior is that the foreign key checker caches a
plan for each foreign-key-checking query the first time it needs to
use that query (within a given backend). There should be a mechanism
to flush those cached plans when circumstances change ... but currently
there isn't.

regards, tom lane

#7Andreas Wernitznig
andreas@insilico.com
In reply to: Tom Lane (#6)
Re: Bug #474: Index using problem

Why don't you skip the automatic index creation for primary keys and let the user decide to create an index,
that should be used in any case, regardless what the query planner recommends ?

On Fri, 05 Oct 2001 15:15:06 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andreas Wernitznig <andreas@insilico.com> writes:

-> Using one connection the optimizer for pk/fk-checking is not
updated by a "vacuum analyze".

Oh, I misunderstood you the first time: I thought you were saying that
*other* backends couldn't see the results of the VACUUM.

The reason for this behavior is that the foreign key checker caches a
plan for each foreign-key-checking query the first time it needs to
use that query (within a given backend). There should be a mechanism
to flush those cached plans when circumstances change ... but currently
there isn't.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Wernitznig (#7)
Re: Bug #474: Index using problem

Andreas Wernitznig <andreas@insilico.com> writes:

Why don't you skip the automatic index creation for primary keys

And how then would we enforce uniqueness of the primary key?

and let the user decide to create an index, that should be used in any
case, regardless what the query planner recommends ?

What? AFAIR the planner does not even *know* whether an index is
associated with a primary key. It certainly does not give such an
index any special preference.

regards, tom lane

#9Andreas Wernitznig
andreas@insilico.com
In reply to: Tom Lane (#8)
Re: Bug #474: Index using problem

I don't know how the primary key checking and the index scan are related in postgresql (but I am keen to learn more about it).

From Sybase ASA (SQL Anywhere) I know, that these two functions (pk checking, index search) are decoupled.

(that means even a primary key without an index is possible. This makes only sense in tables with a few rows).

The pg-planner takes the current number of datasets to calculate a query plan to reach the best performance.
If the number of datasets changes (and the user/db-programmer usually knows if it changes) the planner sometimes make wrong decisions.
Then I have to execute a "vacuum analyze" or reconnect in case of foreign key checking.

I would like to tune postgresql to use an index in any case if it is available.

On Fri, 05 Oct 2001 18:01:08 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andreas Wernitznig <andreas@insilico.com> writes:

Why don't you skip the automatic index creation for primary keys

And how then would we enforce uniqueness of the primary key?

and let the user decide to create an index, that should be used in any
case, regardless what the query planner recommends ?

What? AFAIR the planner does not even *know* whether an index is
associated with a primary key. It certainly does not give such an
index any special preference.

regards, tom lane