index scan leads to result that is different from sec scan after upgrading to 8.3.4

Started by Sergey Konoplevover 17 years ago14 messagesgeneral
Jump to latest
#1Sergey Konoplev
gray.ru@gmail.com

Hi all,

Well, we have migrated our server from 8.3.3 to 8.3.4. The server is
based on Red Hat and an instans it deals with insalled on RAMFS.

db_online=> select version();
version
----------------------------------------------------------------------------------------
PostgreSQL 8.3.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20071124 (Red Hat 4.1.2-42)
(1 row)

There is a table:
Table "public.person_online"
Column | Type |
Modifiers
----------------------------+--------------------------+---------------------------------
obj_id | bigint | not null
obj_status_did | smallint |
...
po_since | timestamp with time zone | not null default now()
po_geo_point | point | not null
Indexes:
"pk_person_online" PRIMARY KEY, btree (obj_id)
"i_person_online__geo_point" gist (box(po_geo_point,
po_geo_point)) WHERE obj_status_did = 1
"i_person_online__since" btree (po_since)
Triggers:
t_person_online_since_bu BEFORE UPDATE ON person_online FOR EACH
ROW EXECUTE PROCEDURE t_person_online_since()

Pay attention to i_person_online__geo_point index.

After migration we did initdb, installed btree_gist contrib (it was
instaled on 8.3.3 too) and created the table and the index. Later we
noticed strange behaviour of our application and that is what we have
managed to find:

db_online=> select obj_status_did, count(1) from person_online
where obj_status_did = 1 group by obj_status_did;
obj_status_did | count
----------------+-------
1 | 711
(1 row)

db_online=> select obj_status_did, count(1) from person_online
group by obj_status_did;
obj_status_did | count
----------------+-------
5 | 2
1 | 1930
(2 rows)

explain showed that first query do index scan and second one sec scan.

db_online=>

We did reindex i_person_online__geo_point and situation seemed being solved:

db_online=> select obj_status_did, count(1) from person_online
where obj_status_did = 1 group by obj_status_did;
obj_status_did | count
----------------+-------
1 | 2071
(1 row)
db_online=> select obj_status_did, count(1) from person_online
group by obj_status_did;
obj_status_did | count
----------------+-------
5 | 2
1 | 2071
(2 rows)

But after our application had done some inserts/updates/deletes the
queries showed different count of rows with obj_status_did = 1 again.
We tried to drop-create the index but it did not help either.

Then we created a twin-table and a twin-index and did a lot of inserts
and random deletes using different connections. Everything was
alright. We renamed original table to person_online_tmp and the
twin-table to person_online. Few minutes later we saw wrong behaviour
again.

Will somebody explain what has happened and how to solve the problem please?

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sergey Konoplev (#1)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

"Sergey Konoplev" <gray.ru@gmail.com> writes:

Will somebody explain what has happened and how to solve the problem please?

Apparently you've found a bug in either btree_gist or the core GIST
code. Can you put together a self-contained test case?

regards, tom lane

#3Sergey Konoplev
gray.ru@gmail.com
In reply to: Sergey Konoplev (#1)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Sorry, I forget to paste explains

explain showed that first query do index scan and second one sec scan.

db_online=> select obj_status_did, count(1) from person_online
where obj_status_did = 1 group by obj_status_did;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..36.37 rows=1 width=2) (actual
time=1.169..1.170 rows=1 loops=1)
-> Index Scan using i_person_online_test__geo_point on
person_online_test (cost=0.00..30.07 rows=1258 width=2) (actual
time=0.016..0.928 rows=258 loops=1)
Total runtime: 1.268 ms
(3 rows)

db_online=> select obj_status_did, count(1) from person_online
group by obj_status_did;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=302.87..302.88 rows=1 width=2) (actual
time=0.668..0.669 rows=1 loops=1)
-> Seq Scan on person_online_test (cost=0.00..296.58 rows=1258
width=2) (actual time=0.009..0.372 rows=258 loops=1)
Total runtime: 0.754 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sergey Konoplev (#3)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

"Sergey Konoplev" <gray.ru@gmail.com> writes:

db_online=> select obj_status_did, count(1) from person_online
where obj_status_did = 1 group by obj_status_did;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..36.37 rows=1 width=2) (actual
time=1.169..1.170 rows=1 loops=1)
-> Index Scan using i_person_online_test__geo_point on
person_online_test (cost=0.00..30.07 rows=1258 width=2) (actual
time=0.016..0.928 rows=258 loops=1)
Total runtime: 1.268 ms
(3 rows)

Hmm. So the problem seems to be statable as "a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation". Teodor, can you think of anything you
changed recently in that area?

regards, tom lane

#5Sergey Konoplev
gray.ru@gmail.com
In reply to: Tom Lane (#2)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

"Sergey Konoplev" <gray.ru@gmail.com> writes:

Will somebody explain what has happened and how to solve the problem please?

Apparently you've found a bug in either btree_gist or the core GIST
code. Can you put together a self-contained test case?

Ok, I will try.

One more thing I want to mention is that we have fixed (temporary) the
problem by recreating the index without "where" clause so it is not
used by the queries now.

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

#6Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#4)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Hmm. So the problem seems to be statable as "a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation". Teodor, can you think of anything you
changed recently in that area?

Only fixing possible duplicates during index scan. Will see.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#7Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#4)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Hmm. So the problem seems to be statable as "a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation". Teodor, can you think of anything you
changed recently in that area?

I still can't reproduce the bug, but found useless recheck condition with bitmap
check:

drop table if exists qq;
select 1 as st , 1::int4 as t into qq from generate_series(1,10000) as t;
create index qqidx on qq using gist (st) where t =1;
INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from
generate_series(1,10000));

# explain select t, count(1) from qq where t =1 group by t;
QUERY PLAN
-----------------------------------------------------------------------------
GroupAggregate (cost=19.62..633.49 rows=1 width=2)
-> Bitmap Heap Scan on qq (cost=19.62..630.28 rows=640 width=2)
Recheck Cond: (t = 1)
-> Bitmap Index Scan on qqidx (cost=0.00..19.46 rows=640 width=0)

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#7)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Teodor Sigaev <teodor@sigaev.ru> writes:

I still can't reproduce the bug, but found useless recheck condition with bitmap
check:

select 1 as st , 1::int4 as t into qq from generate_series(1,10000) as t;
create index qqidx on qq using gist (st) where t =1;
INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from
generate_series(1,10000));

# explain select t, count(1) from qq where t =1 group by t;
QUERY PLAN
-----------------------------------------------------------------------------
GroupAggregate (cost=19.62..633.49 rows=1 width=2)
-> Bitmap Heap Scan on qq (cost=19.62..630.28 rows=640 width=2)
Recheck Cond: (t = 1)
-> Bitmap Index Scan on qqidx (cost=0.00..19.46 rows=640 width=0)

No, it's not useless. If the bitmap goes lossy, we have to recheck the
index predicate too, because we'll be looking at all rows on the
indicated page(s) and there's no guarantee that they all match the
predicate.

regards, tom lane

#9Sergey Konoplev
gray.ru@gmail.com
In reply to: Tom Lane (#8)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Ok, I've done the test case (see attachment).

8.3.3 has passed it.
8.3.4 hasn't passed in ~99% times I run it.

Steps to reproduce:
1. install pg 8.3.4, do initdb, start pg
2. correct PSQL parameter in pg-8.3.4_index_update_test.sh
3. run pg-8.3.4_index_update_test.sh few times

And you will see something like this:

...

------------------------------------------
2nd time obtaining "seq-scan" count and plan...
------------------------------------------
SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
table1_flag | count
-------------+-------
1 | 100
(1 row)

EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
HashAggregate (cost=15.00..15.01 rows=1 width=2) (actual
time=0.140..0.140 rows=1 loops=1)
-> Seq Scan on table1 (cost=0.00..12.00 rows=600 width=2) (actual
time=0.004..0.059 rows=100 loops=1)
Total runtime: 0.172 ms
(3 rows)

------------------------------------------
2nd time obtaining "index-scan" count and plan...
------------------------------------------
SELECT count(*) FROM table1
WHERE table1_flag = 1;
count
-------
98
(1 row)

EXPLAIN ANALYZE SELECT count(*) FROM table1
WHERE table1_flag = 1;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451
rows=1 loops=1)
-> Index Scan using i_table1__table1_point on table1
(cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98
loops=1)
Total runtime: 0.477 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

Attachments:

pg-8.3.4_index_update_test.shapplication/x-sh; name=pg-8.3.4_index_update_test.shDownload
#10Sergey Konoplev
gray.ru@gmail.com
In reply to: Sergey Konoplev (#9)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Ok, I've done the test case (see attachment).

8.3.3 has passed it.
8.3.4 hasn't passed in ~99% times I run it.

Steps to reproduce:
1. install pg 8.3.4, do initdb, start pg
2. correct PSQL parameter in pg-8.3.4_index_update_test.sh
3. run pg-8.3.4_index_update_test.sh few times

And you will see something like this:

...

------------------------------------------
2nd time obtaining "seq-scan" count and plan...
------------------------------------------
SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
table1_flag | count
-------------+-------
1 | 100
(1 row)

EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
HashAggregate (cost=15.00..15.01 rows=1 width=2) (actual
time=0.140..0.140 rows=1 loops=1)
-> Seq Scan on table1 (cost=0.00..12.00 rows=600 width=2) (actual
time=0.004..0.059 rows=100 loops=1)
Total runtime: 0.172 ms
(3 rows)

------------------------------------------
2nd time obtaining "index-scan" count and plan...
------------------------------------------
SELECT count(*) FROM table1
WHERE table1_flag = 1;
count
-------
98
(1 row)

EXPLAIN ANALYZE SELECT count(*) FROM table1
WHERE table1_flag = 1;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451
rows=1 loops=1)
-> Index Scan using i_table1__table1_point on table1
(cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98
loops=1)
Total runtime: 0.477 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

Attachments:

pg-8.3.4_index_update_test.tar.gzapplication/x-gzip; name=pg-8.3.4_index_update_test.tar.gzDownload
#11Teodor Sigaev
teodor@sigaev.ru
In reply to: Sergey Konoplev (#10)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Thank you, I reproduce the bug, will fix.

Sergey Konoplev wrote:

Ok, I've done the test case (see attachment).

8.3.3 has passed it.
8.3.4 hasn't passed in ~99% times I run it.

Steps to reproduce:
1. install pg 8.3.4, do initdb, start pg
2. correct PSQL parameter in pg-8.3.4_index_update_test.sh
3. run pg-8.3.4_index_update_test.sh few times

And you will see something like this:

...

------------------------------------------
2nd time obtaining "seq-scan" count and plan...
------------------------------------------
SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
table1_flag | count
-------------+-------
1 | 100
(1 row)

EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
HashAggregate (cost=15.00..15.01 rows=1 width=2) (actual
time=0.140..0.140 rows=1 loops=1)
-> Seq Scan on table1 (cost=0.00..12.00 rows=600 width=2) (actual
time=0.004..0.059 rows=100 loops=1)
Total runtime: 0.172 ms
(3 rows)

------------------------------------------
2nd time obtaining "index-scan" count and plan...
------------------------------------------
SELECT count(*) FROM table1
WHERE table1_flag = 1;
count
-------
98
(1 row)

EXPLAIN ANALYZE SELECT count(*) FROM table1
WHERE table1_flag = 1;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451
rows=1 loops=1)
-> Index Scan using i_table1__table1_point on table1
(cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98
loops=1)
Total runtime: 0.477 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#12Teodor Sigaev
teodor@sigaev.ru
In reply to: Sergey Konoplev (#10)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

Fixed, patch attached.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

patchtext/plain; name=patchDownload+32-25
#13Laurent Wandrebeck
l.wandrebeck@gmail.com
In reply to: Teodor Sigaev (#12)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

20 hours to find the fix Teodor, Kudos !
Due to the importance of the fix, will we see very soon a 8.3.5 ?
Regards,
Laurent.

#14Teodor Sigaev
teodor@sigaev.ru
In reply to: Laurent Wandrebeck (#13)
Re: index scan leads to result that is different from sec scan after upgrading to 8.3.4

20 hours to find the fix Teodor, Kudos !

Nothing for the pride :(, my bug.

Due to the importance of the fix, will we see very soon a 8.3.5 ?

Don't known, see discussion. I think, that will make sense.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/