Index usage for BYTEA column in OR/IN clause
Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?
-- b is an indexed BYTEA column
explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- index
explain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seq
Currently I'm setting enable_seqscan to off for these...
--
dave
On Sat, Mar 27, 2004 at 21:52:45 +0700,
David Garamond <lists@zara.6.isreserved.com> wrote:
Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?
Without seeing the explain analyse output for these queries it is going
to be hard to say why sequential scans were used in some cases.
If the planner estimates it will be visiting a substantial fraction of
rows in a table (something like 5 or 10%) then it will use a sequential
scan because this will be faster.
Postgres doesn't use bit mapping to speed up searches on or'd conditions,
so that sequential scans are going to look even better when compared to
doing multiple index scans.
However, I would have expected the queries below to use index scans
on real tables where the b column was unique or nearly so. My guess
is that you tried this using toy tables and that for them a sequential
scan could easily be faster.
Show quoted text
-- b is an indexed BYTEA column
explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- indexexplain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seqCurrently I'm setting enable_seqscan to off for these...
--
dave---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
David Garamond <lists@zara.6.isreserved.com> writes:
Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?
No. BYTEA is the same as any other datatype.
regards, tom lane
The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.
Bruno Wolff III wrote:
Without seeing the explain analyse output for these queries it is going
to be hard to say why sequential scans were used in some cases.If the planner estimates it will be visiting a substantial fraction of
rows in a table (something like 5 or 10%) then it will use a sequential
scan because this will be faster.Postgres doesn't use bit mapping to speed up searches on or'd conditions,
so that sequential scans are going to look even better when compared to
doing multiple index scans.However, I would have expected the queries below to use index scans
on real tables where the b column was unique or nearly so. My guess
is that you tried this using toy tables and that for them a sequential
scan could easily be faster.-- b is an indexed BYTEA column
explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- indexexplain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seqCurrently I'm setting enable_seqscan to off for these...
--
dave
David Garamond <lists@zara.6.isreserved.com> writes:
The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.
[shrug...] We can't possibly diagnose a bad-plan-choice problem with
the amount of information you've provided. See
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines
regards, tom lane
Tom Lane wrote:
David Garamond <lists@zara.6.isreserved.com> writes:
The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.[shrug...] We can't possibly diagnose a bad-plan-choice problem with
the amount of information you've provided. See
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines
Okay, my bad. How about this:
1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):
#!/usr/bin/ruby -rpostgres
ROWS = 1_000_000
conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...")
conn.exec("CREATE TABLE t (b BYTEA NOT NULL)")
conn.exec("CREATE INDEX i_t_b ON t(b)")
(1..ROWS).each { |i|
b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")}
if i % 1000 == 1; conn.exec("BEGIN"); end
conn.exec("INSERT INTO t VALUES ('#{b}')")
if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end
}
2. output of explain analyze:
=> explain analyze select * from t where b='aa';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.062..0.062 rows=0 loops=1)
Index Cond: (b = 'aa'::bytea)
Total runtime: 0.166 ms
(3 rows)
Time: 19.372 ms
=> explain analyze select * from t where b='aa' or b='ab';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=6857.151..6857.151 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 6857.345 ms
(3 rows)
Time: 6864.526 ms
=> explain analyze select * from t where b like 'aa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.682..15.763 rows=2 loops=1)
Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea))
Filter: (b ~~ 'aa%'::bytea)
Total runtime: 15.935 ms
(4 rows)
Time: 29.432 ms
=> explain analyze select * from t where b like 'aa%' or b like 'ab%';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=636.738..7239.460 rows=3 loops=1)
Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea))
Total runtime: 7239.758 ms
(3 rows)
Time: 7251.326 ms
=> explain analyze select * from t where b in ('aa','ab');
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=7055.390..7055.390 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 7055.574 ms
(3 rows)
Time: 7063.942 ms
--
dave
David Garamond <lists@zara.6.isreserved.com> writes:
1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):
This script is lacking a VACUUM or ANALYZE command, so the planner
doesn't know how large the table is. Note the ridiculously small
cost estimates in EXPLAIN ...
regards, tom lane