GIN INdex is not used with && operator for a text array index

Started by balasubramanian c rover 2 years ago2 messagesgeneral

crbs.siebel@gmail.com

over 2 years ago

HI Team

Sorry for the spam.

We have Postgres DB where the list of addresses are stored and for a given
complete address
trigram of addresses are stored in a column which is a text array.

after looking at the list of operators that are available for gin index I
decided to use array_ops operator.
select amop.amopopr::regoperator, amop.amopstrategy, opc.opcname from
pg_opclass opc, pg_opfamily opf, pg_am am, pg_amop amop where opf.oid =
opc.opcfamily and am.oid = opf.opfmethod and amop.amopfamily =
opc.opcfamily and am.amname = 'gin' and amop.amoplefttype = opc.opcintype;
amopopr | amopstrategy | opcname
-------------------------+--------------+----------------
&&(anyarray,anyarray) | 1 | array_ops
@>(anyarray,anyarray) | 2 | array_ops
<@(anyarray,anyarray) | 3 | array_ops
=(anyarray,anyarray) | 4 | array_ops

CREATE INDEX pentgram_idx ON address18 USING GIN(pentgram array_ops);

when Operator '@>' is used the index is used and the execution time is 60ms.

db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan
kangra riyali 144 hp 176058', complete_address) from address18 where
pentgram @> show_trgm('jattan kangra riyali 144 hp 176058');

Bitmap Heap Scan on public.address18 (cost=261.25..262.52 rows=1 width=4)
(actual time=58.992..58.994 rows=1 loops=1)
Output: similarity('*****'::text, complete_address)
Recheck Cond: (address18.pentgram @> '{******}'::text[])
Heap Blocks: exact=1
Buffers: shared hit=1483
-> Bitmap Index Scan on pentgram_idx (cost=0.00..261.25 rows=1
width=0) (actual time=58.960..58.960 rows=1 loops=1)
Index Cond: (address18.pentgram @> '{***}'::text[])
Buffers: shared hit=1482
Query Identifier: -126591413296272164
Planning:
Buffers: shared hit=1
Planning Time: 0.679 ms
Execution Time: 60.373 ms
(13 rows)

when Operator '&&' is used the index is used and the execution time is 60ms.
It is performing sequential scan which is not expected.

db=# EXPLAIN (costs, buffers, verbose, analyze) select similarity('jattan
kangra riyali 144 hp 176058', complete_address) from address18 where
pentgram && show_trgm('jattan kangra riyali 144 hp 176058');

Seq Scan on public.address18 (cost=0.00..77215.11 rows=247741 width=4)
(actual time=0.063..1880.467 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Filter: (address18.pentgram && '{"****}'::text[])
Buffers: shared hit=3592 read=69907
I/O Timings: shared/local read=267.274
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 0.386 ms
Execution Time: 1886.125 ms
(10 rows)

When i disable the sequential scan the execution time is increased
significantly.

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.address18 (cost=2266.24..79481.36 rows=247741
width=4) (actual time=186.402..3285.090 rows=247741 loops=1)
Output: similarity('****'::text, complete_address)
Recheck Cond: (address18.pentgram && '{***********}'::text[])
Heap Blocks: exact=39632
Buffers: shared hit=1361 read=39155 written=6260
I/O Timings: shared/local read=981.127 write=544.491
-> Bitmap Index Scan on pentgram_idx (cost=0.00..2204.31 rows=247741
width=0) (actual time=182.462..182.463 rows=247741 loops=1)
Index Cond: (address18.pentgram && '{*****}'::text[])
Buffers: shared hit=884
Query Identifier: 2367846469053211383
Planning:
Buffers: shared hit=1
Planning Time: 6.707 ms
Execution Time: 3292.339 ms
(14 rows)

Not expecting this behavior currently.

Few parameters in my configuration
postgresql version is 15
OS is RHEL 8.8
effective_cache_size | 131072
huge_pages | off
maintenance_work_mem | 327680
max_parallel_maintenance_workers | 1

max_parallel_workers | 2
max_parallel_workers_per_gather | 1
shared_buffers
| 32768

Thanks
C.R.Bala

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: balasubramanian c r (#1)

Re: GIN INdex is not used with && operator for a text array index

balasubramanian c r <crbs.siebel@gmail.com> writes:

when Operator '@>' is used the index is used and the execution time is 60ms.

Yeah ... note that it's predicted to return just one row, and that
guess is correct:

Bitmap Heap Scan on public.address18 (cost=261.25..262.52 rows=1 width=4)
(actual time=58.992..58.994 rows=1 loops=1)

when Operator '&&' is used the index is used and the execution time is 60ms.
It is performing sequential scan which is not expected.

I do not see why you expect that. This case retrieves many more rows
than the other one, both in the planner's estimate and in reality:

Seq Scan on public.address18 (cost=0.00..77215.11 rows=247741 width=4)
(actual time=0.063..1880.467 rows=247741 loops=1)

An indexscan is not necessarily better than a seqscan for such cases...

When i disable the sequential scan the execution time is increased
significantly.

... so, indeed, the planner was correct to use a seqscan.

Not expecting this behavior currently.

Your expectation is faulty.

regards, tom lane