Index usage for BYTEA column in OR/IN clause

Started by David Garamondabout 22 years ago8 messagesgeneral

Jump to latest

David Garamond

lists@zara.6.isreserved.com

about 22 years ago

Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?

-- b is an indexed BYTEA column

explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- index

explain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seq

Currently I'm setting enable_seqscan to off for these...

--
dave

Bruno Wolff III

bruno@wolff.to

about 22 years ago

In reply to: David Garamond (#1)

Re: Index usage for BYTEA column in OR/IN clause

On Sat, Mar 27, 2004 at 21:52:45 +0700,
David Garamond <lists@zara.6.isreserved.com> wrote:

Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?

Without seeing the explain analyse output for these queries it is going
to be hard to say why sequential scans were used in some cases.

If the planner estimates it will be visiting a substantial fraction of
rows in a table (something like 5 or 10%) then it will use a sequential
scan because this will be faster.

Postgres doesn't use bit mapping to speed up searches on or'd conditions,
so that sequential scans are going to look even better when compared to
doing multiple index scans.

However, I would have expected the queries below to use index scans
on real tables where the b column was unique or nearly so. My guess
is that you tried this using toy tables and that for them a sequential
scan could easily be faster.

Show quoted text

-- b is an indexed BYTEA column

explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- index

explain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seq

Currently I'm setting enable_seqscan to off for these...

--
dave

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: David Garamond (#1)

Re: Index usage for BYTEA column in OR/IN clause

David Garamond <lists@zara.6.isreserved.com> writes:

Is it true that the planner currently doesn't utilize index for BYTEA
column in OR or IN clause?

No. BYTEA is the same as any other datatype.

regards, tom lane

David Garamond

lists@zara.6.isreserved.com

about 22 years ago

In reply to: Bruno Wolff III (#2)

Re: Index usage for BYTEA column in OR/IN clause

The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.

Bruno Wolff III wrote:

Without seeing the explain analyse output for these queries it is going
to be hard to say why sequential scans were used in some cases.

If the planner estimates it will be visiting a substantial fraction of
rows in a table (something like 5 or 10%) then it will use a sequential
scan because this will be faster.

Postgres doesn't use bit mapping to speed up searches on or'd conditions,
so that sequential scans are going to look even better when compared to
doing multiple index scans.

However, I would have expected the queries below to use index scans
on real tables where the b column was unique or nearly so. My guess
is that you tried this using toy tables and that for them a sequential
scan could easily be faster.

-- b is an indexed BYTEA column

explain select * from t where b='foo'; -- index scan
explain select * from t where b like 'f%'; -- index

explain select * from t where b='foo' or b='bar'; -- seq scan
explain select * from t where b='foo' or b like 'b%'; -- seq
explain select * from t where b like 'f%' or b like 'b%'; -- seq
explain select * from t where b in ('foo','bar'); -- seq

Currently I'm setting enable_seqscan to off for these...

--
dave

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: David Garamond (#4)

Re: Index usage for BYTEA column in OR/IN clause

David Garamond <lists@zara.6.isreserved.com> writes:

The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.

[shrug...] We can't possibly diagnose a bad-plan-choice problem with
the amount of information you've provided. See
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

regards, tom lane

David Garamond

lists@zara.6.isreserved.com

about 22 years ago

In reply to: Tom Lane (#5)

Re: Index usage for BYTEA column in OR/IN clause

Tom Lane wrote:

David Garamond <lists@zara.6.isreserved.com> writes:

The table contain +- 1 mil records, all of the actual version of the
queries below return < 10 rows, so an index should be used. Using an
index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
no good reason why a seq scan should be used, especially in a case of
b='foo' or b='bar'.

[shrug...] We can't possibly diagnose a bad-plan-choice problem with
the amount of information you've provided. See
http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

Okay, my bad. How about this:

1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):

#!/usr/bin/ruby -rpostgres
ROWS = 1_000_000
conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...")
conn.exec("CREATE TABLE t (b BYTEA NOT NULL)")
conn.exec("CREATE INDEX i_t_b ON t(b)")
(1..ROWS).each { |i|
b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")}
if i % 1000 == 1; conn.exec("BEGIN"); end
conn.exec("INSERT INTO t VALUES ('#{b}')")
if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end
}

2. output of explain analyze:

=> explain analyze select * from t where b='aa';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.062..0.062 rows=0 loops=1)
Index Cond: (b = 'aa'::bytea)
Total runtime: 0.166 ms
(3 rows)

Time: 19.372 ms

=> explain analyze select * from t where b='aa' or b='ab';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=6857.151..6857.151 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 6857.345 ms
(3 rows)

Time: 6864.526 ms

=> explain analyze select * from t where b like 'aa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.682..15.763 rows=2 loops=1)
Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea))
Filter: (b ~~ 'aa%'::bytea)
Total runtime: 15.935 ms
(4 rows)

Time: 29.432 ms

=> explain analyze select * from t where b like 'aa%' or b like 'ab%';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=636.738..7239.460 rows=3 loops=1)
Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea))
Total runtime: 7239.758 ms
(3 rows)

Time: 7251.326 ms

=> explain analyze select * from t where b in ('aa','ab');
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=7055.390..7055.390 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 7055.574 ms
(3 rows)

Time: 7063.942 ms

--
dave

Tom Lane

tgl@sss.pgh.pa.us

about 22 years ago

In reply to: David Garamond (#6)

Re: Index usage for BYTEA column in OR/IN clause

David Garamond <lists@zara.6.isreserved.com> writes:

1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):

This script is lacking a VACUUM or ANALYZE command, so the planner
doesn't know how large the table is. Note the ridiculously small
cost estimates in EXPLAIN ...

regards, tom lane

David Garamond

lists@zara.6.isreserved.com

about 22 years ago

In reply to: Tom Lane (#7)

Re: Index usage for BYTEA column in OR/IN clause

Tom Lane wrote:

This script is lacking a VACUUM or ANALYZE command, so the planner
doesn't know how large the table is. Note the ridiculously small
cost estimates in EXPLAIN ...

I see, I never knew about having to VACUUM/ANALYZE first. Thanks.

--
dave