Use of index for 50% column restriction

Started by Bruce Momjianalmost 10 years ago6 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes. To
setup the test:

DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, 0, 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, 1, 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;

The output is:

QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using i_test_c2 on test (cost=0.29..349.29 rows=10000 width=12)
----------
Index Cond: (c2 = 0)
(2 rows)

\timing does show the optimizer is making the right decision to use the
index, and this behavior is the same back to at least 9.3. Setting
effective_cache_size = '8kB' does not change this behavior. What am I
missing? Is my 10% assumption wrong?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Use of index for 50% column restriction

Bruce Momjian <bruce@momjian.us> writes:

As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes.

There's no such hard-and-fast rule. The cost estimate break point depends
greatly on the index order correlation (which is 100% in your example),
as well as some other factors like the index size versus
effective_cache_size.

For randomly-ordered data I believe the cutover is actually well below 10%.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Use of index for 50% column restriction

On Wed, Jun 8, 2016 at 01:28:54PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

As part of my research on the parsing/planning behavior of PREPARE, I
found a surprising behavior --- a WHERE clause that is 50% restrictive
is using an index. I thought only <10% restrictions used indexes.

There's no such hard-and-fast rule. The cost estimate break point depends
greatly on the index order correlation (which is 100% in your example),
as well as some other factors like the index size versus
effective_cache_size.

For randomly-ordered data I believe the cutover is actually well below 10%.

Ah, I had not considered the correlation order of the rows in the table.
This test returns the sequential scan I expected by using floor(random()
* 2):

DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;

Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#3)
Re: Use of index for 50% column restriction

On Wed, Jun 8, 2016 at 05:07:34PM -0400, Bruce Momjian wrote:

For randomly-ordered data I believe the cutover is actually well below 10%.

Ah, I had not considered the correlation order of the rows in the table.
This test returns the sequential scan I expected by using floor(random()
* 2):

DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, floor(random() * 2), 0 FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, floor(random() * 2), 1 FROM generate_series(10001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;
EXPLAIN SELECT * FROM test WHERE c2 = 0;

Thanks.

Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:

DROP TABLE IF EXISTS test;
CREATE TABLE test (c1 INT, c2 INT, c3 INT);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(1, 10000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(10001, 15000) AS a(c1);
INSERT INTO test SELECT c1, abs(floor(random() * 4)-1), abs(floor(random() * 4)-1) FROM generate_series(15001, 20000) AS a(c1);
CREATE INDEX i_test_c2 ON test (c2);
ANALYZE test;

SELECT c2, COUNT(*) FROM test GROUP BY c2 ORDER BY 1;
c2 | count
----+-------
0 | 5020 25%
1 | 10006 50%
2 | 4974 25%

EXPLAIN SELECT * FROM TEST WHERE c2 = 1;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on test (cost=0.00..359.00 rows=10006 width=12)
Filter: (c2 = 1)

EXPLAIN SELECT * FROM TEST WHERE c2 = 0;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=99.19..270.94 rows=5020 width=12)
Recheck Cond: (c2 = 0)
-> Bitmap Index Scan on i_test_c2 (cost=0.00..97.94 rows=5020 width=0)
Index Cond: (c2 = 0)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#4)
Re: Use of index for 50% column restriction

On 6/8/16 4:36 PM, Bruce Momjian wrote:

Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:

AFAIK we do the bitmap heap scan in heap order, thereby eliminating the
effect of correlation?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#5)
Re: Use of index for 50% column restriction

On Tue, Jun 14, 2016 at 03:24:12PM -0500, Jim Nasby wrote:

On 6/8/16 4:36 PM, Bruce Momjian wrote:

Just a follow-up, but even with a randomized correlation order, it seems
25% restrictivity generates a Bitmap Index Scan:

AFAIK we do the bitmap heap scan in heap order, thereby eliminating the
effect of correlation?

High correlation would still cause us to access fewer heap pages than
random data.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers