Use of index for 50% column restriction
As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes. To
setup the test:
DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, 0, 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, 1, 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;
The output is:
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using i_test_c2 on test (cost=0.29..349.29 rows=10000 width=12)
----------
Index Cond: (c2 = 0)
(2 rows)
\timing does show the optimizer is making the right decision to use the
index, and this behavior is the same back to at least 9.3. Setting
effective_cache_size = '8kB' does not change this behavior. What am I
missing? Is my 10% assumption wrong?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes.
There's no such hard-and-fast rule. The cost estimate break point depends
greatly on the index order correlation (which is 100% in your example),
as well as some other factors like the index size versus
effective_cache_size.
For randomly-ordered data I believe the cutover is actually well below 10%.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 01:28:54PM -0400, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes.There's no such hard-and-fast rule. The cost estimate break point depends
greatly on the index order correlation (which is 100% in your example),
as well as some other factors like the index size versus
effective_cache_size.For randomly-ordered data I believe the cutover is actually well below 10%.
Ah, I had not considered the correlation order of the rows in the table.
This test returns the sequential scan I expected by using floor(random()
* 2):
DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;
Thanks.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 8, 2016 at 05:07:34PM -0400, Bruce Momjian wrote:
For randomly-ordered data I believe the cutover is actually well below 10%.
Ah, I had not considered the correlation order of the rows in the table.
This test returns the sequential scan I expected by using floor(random()
* 2):DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;Thanks.
Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:
DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
SELECT c2, COUNT(*) FROM test GROUP BY c2 ORDER BY 1;
c2 | count
----+-------
0 | 5020 25%
1 | 10006 50%
2 | 4974 25%
EXPLAIN SELECT * FROM TEST WHERE c2 = 1;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on test (cost=0.00..359.00 rows=10006 width=12)
Filter: (c2 = 1)
EXPLAIN SELECT * FROM TEST WHERE c2 = 0;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=99.19..270.94 rows=5020 width=12)
Recheck Cond: (c2 = 0)
-> Bitmap Index Scan on i_test_c2 (cost=0.00..97.94 rows=5020 width=0)
Index Cond: (c2 = 0)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/8/16 4:36 PM, Bruce Momjian wrote:
Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:
AFAIK we do the bitmap heap scan in heap order, thereby eliminating the
effect of correlation?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 14, 2016 at 03:24:12PM -0500, Jim Nasby wrote:
On 6/8/16 4:36 PM, Bruce Momjian wrote:
Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:AFAIK we do the bitmap heap scan in heap order, thereby eliminating the
effect of correlation?
High correlation would still cause us to access fewer heap pages than
random data.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers