Bug #681: creating indices after data insertion

Started by PostgreSQL Bugs Listalmost 24 years ago3 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

Markus Grill (mg@schwarz.de) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
creating indices after data insertion

Long Description
Hello,

in my opinion there is a big bug with indexes.
I used Postgresql 7.1.3 and 7.2.1

First I create two tables t_a and t_b with the fields t_a.id, t_a.bid and t_b.id where t_a.bid is a foreign key for t_b.id.
Then I insert many data records in the two tables.
After that I create one index i1 on t_a.id, one index i2 on t_a.bid and one index i3 on t_b.id.

When I select over just one of the two tables (EXPLAIN SELECT * FROM t_a WHERE t_a.id = 'XXX';) then the query plan (and the query) uses index scans.

But when I select over the two tables with a join (EXPLAIN SELECT * FROM t_a, t_b WHERE t_aid = 'XXX' AND t_a_bid = t_b_id;) then the query plan uses slow seq scans (That's not OK)!

We tried another thing:
First creating the tables,
then creating the indexes
and then inserting the data records:
In this case the query uses index scans (that's OK)!

We tried a third thing:
Creating the tables, creating the indexes, inserting the data records:
---> index scans (OK).
Then dropping the indexes and creating three new indexes:
---> seq scans (not OK).

Please do not say, I should create my indexes first, because this IS a bug.
That's a big problem when inserting many data records, because it's much faster to drop the index first, then inserting the data and then creating new indexes.

Please help! :-)

Sample Code
My real life examle:
The first example is the bug the second is the running one:

testrpr=# EXPLAIN SELECT * from inf_information, auct_sys_infogruppe where inf_alias = 'asdfasdf' and infgrp_id = auct_sys_infogruppe.inf_id;
NOTICE: QUERY PLAN:

Hash Join (cost=5.95..8.86 rows=1 width=1070)
-> Seq Scan on auct_sys_infogruppe (cost=0.00..2.61 rows=61 width=555)
-> Hash (cost=5.94..5.94 rows=1 width=515)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..5.94 rows=1 width=515)

NOTICE: QUERY PLAN:

Hash Join (cost=5.95..8.86 rows=1 width=1070)
-> Seq Scan on auct_sys_infogruppe (cost=0.00..2.61 rows=61 width=555)
-> Hash (cost=5.94..5.94 rows=1 width=515)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..5.94 rows=1 width=515)

EXPLAIN

------------------------------------------

testrpr=# EXPLAIN SELECT * from inf_information, auct_sys_infogruppe where inf_alias = 'asdfasdf' and infgrp_id = auct_sys_infogruppe.inf_id;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..9.66 rows=1 width=1070)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..4.82 rows=1 width=515)
-> Index Scan using xpkauct_sys_infogruppe on auct_sys_infogruppe (cost=0.00..4.82 rows=1 width=555)

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..9.66 rows=1 width=1070)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..4.82 rows=1 width=515)
-> Index Scan using xpkauct_sys_infogruppe on auct_sys_infogruppe (cost=0.00..4.82 rows=1 width=555)

EXPLAIN

No file was uploaded with this report

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #681: creating indices after data insertion

Dmitry Tkach <dmitry@openratings.com> writes:

Did you vacuum it?

Even more to the point: vacuum analyze.

The described behavior sounds like the default assumptions in the
absence of any ANALYZE statistics ...

regards, tom lane

#3Dmitry Tkach
dmitry@openratings.com
In reply to: PostgreSQL Bugs List (#1)
Re: Bug #681: creating indices after data insertion

Did you vacuum it?

I hope, it helps...

Dima
pgsql-bugs@postgresql.org wrote:

Show quoted text

Markus Grill (mg@schwarz.de) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
creating indices after data insertion

Long Description
Hello,

in my opinion there is a big bug with indexes.
I used Postgresql 7.1.3 and 7.2.1

First I create two tables t_a and t_b with the fields t_a.id, t_a.bid and t_b.id where t_a.bid is a foreign key for t_b.id.
Then I insert many data records in the two tables.
After that I create one index i1 on t_a.id, one index i2 on t_a.bid and one index i3 on t_b.id.

When I select over just one of the two tables (EXPLAIN SELECT * FROM t_a WHERE t_a.id = 'XXX';) then the query plan (and the query) uses index scans.

But when I select over the two tables with a join (EXPLAIN SELECT * FROM t_a, t_b WHERE t_aid = 'XXX' AND t_a_bid = t_b_id;) then the query plan uses slow seq scans (That's not OK)!

We tried another thing:
First creating the tables,
then creating the indexes
and then inserting the data records:
In this case the query uses index scans (that's OK)!

We tried a third thing:
Creating the tables, creating the indexes, inserting the data records:
---> index scans (OK).
Then dropping the indexes and creating three new indexes:
---> seq scans (not OK).

Please do not say, I should create my indexes first, because this IS a bug.
That's a big problem when inserting many data records, because it's much faster to drop the index first, then inserting the data and then creating new indexes.

Please help! :-)

Sample Code
My real life examle:
The first example is the bug the second is the running one:

testrpr=# EXPLAIN SELECT * from inf_information, auct_sys_infogruppe where inf_alias = 'asdfasdf' and infgrp_id = auct_sys_infogruppe.inf_id;
NOTICE: QUERY PLAN:

Hash Join (cost=5.95..8.86 rows=1 width=1070)
-> Seq Scan on auct_sys_infogruppe (cost=0.00..2.61 rows=61 width=555)
-> Hash (cost=5.94..5.94 rows=1 width=515)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..5.94 rows=1 width=515)

NOTICE: QUERY PLAN:

Hash Join (cost=5.95..8.86 rows=1 width=1070)
-> Seq Scan on auct_sys_infogruppe (cost=0.00..2.61 rows=61 width=555)
-> Hash (cost=5.94..5.94 rows=1 width=515)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..5.94 rows=1 width=515)

EXPLAIN

------------------------------------------

testrpr=# EXPLAIN SELECT * from inf_information, auct_sys_infogruppe where inf_alias = 'asdfasdf' and infgrp_id = auct_sys_infogruppe.inf_id;
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..9.66 rows=1 width=1070)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..4.82 rows=1 width=515)
-> Index Scan using xpkauct_sys_infogruppe on auct_sys_infogruppe (cost=0.00..4.82 rows=1 width=555)

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..9.66 rows=1 width=1070)
-> Index Scan using xak1inf_information on inf_information (cost=0.00..4.82 rows=1 width=515)
-> Index Scan using xpkauct_sys_infogruppe on auct_sys_infogruppe (cost=0.00..4.82 rows=1 width=555)

EXPLAIN

No file was uploaded with this report

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)