Surprised by index choice for count(*)

Started by Rob Sargentalmost 8 years ago4 messagesgeneral
Jump to latest
#1Rob Sargent
robjsargent@gmail.com

Should I be?  I would have thought the pk would have been chosen v.
function index?

explain analyse select count(*) from bc.segment s;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=4428009.24..4428009.25 rows=1 width=8)
(actual time=14786.395..14786.395 rows=1 loops=1)
   ->  Gather  (cost=4428009.03..4428009.24 rows=2 width=8) (actual
time=14786.358..14786.386 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=4427009.03..4427009.04 rows=1
width=8) (actual time=14782.167..14782.167 rows=1 loops=3)
               ->  Parallel Index Only Scan using fpv on segment s 
(cost=0.57..4210177.14 rows=86732753 width=0) (actual
time=0.061..11352.855 rows=69386204 loops=3)
                     Heap Fetches: 1780
 Planning time: 0.221 ms
 Execution time: 14815.939 ms
(9 rows)

\d bc.segment
              Table "bc.segment"
     Column     |  Type   |     Modifiers
----------------+---------+--------------------
 id             | uuid    | not null
 chrom          | integer | not null
 markerset_id   | uuid    | not null
 probandset_id  | uuid    | not null
 startbase      | integer | not null
 endbase        | integer | not null
 firstmarker    | integer | not null
 lastmarker     | integer | not null
 events_less    | bigint  | not null default 0
 events_equal   | bigint  | not null default 0
 events_greater | bigint  | not null default 0
Indexes:
    "segment_pkey" PRIMARY KEY, btree (id)
    "useg" UNIQUE, btree (probandset_id, chrom, startbase, endbase)
    "fpv" btree (pv(events_less, events_equal, events_greater, 0))
    "segment_markerset_id_probandset_id_idx" btree (markerset_id,
probandset_id)

create or replace function public.pv(l bigint, e bigint, g bigint, o int)
returns float
as
$$
select 1.0*(g+e+o)/(l+e+g+o)::float;
$$
language sql
;

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Rob Sargent (#1)
Re: Surprised by index choice for count(*)

On Tue, May 1, 2018 at 8:46 AM, Rob Sargent <robjsargent@gmail.com> wrote:

Should I be? I would have thought the pk would have been chosen v.
function index?
Indexes:
"segment_pkey" PRIMARY KEY, btree (id)
"useg" UNIQUE, btree (probandset_id, chrom, startbase, endbase)
"fpv" btree (pv(events_less, events_equal, events_greater, 0))
"segment_markerset_id_probandset_id_idx" btree (markerset_id,
probandset_id)

​Haven't looked at the code but among those 4 indexes the one that was
chosen is the one that comes first alphabetically.​ Ideally it would avoid
multi-column indexes since they are larger; and I believe that a float is
smaller than a uuid so not only alphabetically but the fpv index wins on
size too.

That the index is functional is immaterial here.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rob Sargent (#1)
Re: Surprised by index choice for count(*)

Rob Sargent <robjsargent@gmail.com> writes:

Should I be?  I would have thought the pk would have been chosen v.
function index?

If I'm reading this correctly, the PK index contains uuids while
the fpv index contains float4s, meaning the latter is probably half
the size. So scanning it is a lot cheaper, at least according to
the planner's cost model.

regards, tom lane

#4Rob Sargent
robjsargent@gmail.com
In reply to: Tom Lane (#3)
Re: Surprised by index choice for count(*)

Thank you both. Simple, as expected. And I’m easily surprised.
Version 10 (perhaps obviously) for those scoring at home.

Show quoted text

On May 1, 2018, at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Rob Sargent <robjsargent@gmail.com> writes:

Should I be? I would have thought the pk would have been chosen v.
function index?

If I'm reading this correctly, the PK index contains uuids while
the fpv index contains float4s, meaning the latter is probably half
the size. So scanning it is a lot cheaper, at least according to
the planner's cost model.

regards, tom lane