Slow index scan - Pgsql 9.2

Started by Patrick Babout 9 years ago8 messagesgeneral

Jump to latest

Patrick B

patrickbakerbr@gmail.com

about 9 years ago

Hi guys,

I've got the following Query:

WITH

query_p AS (

SELECT CAST(6667176 AS
BIGINT) AS client_id),

clients AS (

SELECT

client.id
,client.job_share_mode

FROM

customers AS
client

WHERE

(client.clientid
= (SELECT qp.client_id FROM query_p AS qp))

AND

NOT
client.is_demo

AND

NOT
client.deleted

)

Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid
= qp.client_id) AND NOT c.deleted) AS client_count

FROM query_p AS qp

*Explain Analyze:*

CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)

CTE query_p

-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)

SubPlan 2

-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)

-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)

Index Cond: ("clientid" = "qp"."client_id")

Filter: (NOT "deleted")

Rows Removed by Filter: 1068

Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

*Table customers:*

Table "public.customers"

Column | Type |
Modifiers

------------------------+-----------------------------+-----------------------------------------------------------------

id | bigint | not null default
"nextval"('"customers_seq"'::"regclass")

clientid | bigint | not null default 0

name_first | character varying(80) | default
''::character varying

name_last | character varying(80) | default
''::character varying

company | character varying(255) | default
''::character varying

*Index clientid_customers:*

CREATE INDEX

clientid_customers

customers

(

"clientid"

);

Thanks!

Patrick

Jan de Visser

jan@de-visser.net

about 9 years ago

In reply to: Patrick B (#1)

Re: Slow index scan - Pgsql 9.2

Hi guys,

I've got the following Query:

WITH

query_p AS (

SELECT CAST(6667176 AS
BIGINT) AS client_id),

clients AS (

SELECT

client.id
,client.job_share_mode

FROM

customers AS
client

WHERE

(client.clientid = (SELECT qp.client_id FROM query_p AS qp))

AND

NOT
client.is_demo

AND

NOT
client.deleted

)

Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE
(c.clientid = qp.client_id) AND NOT c.deleted) AS client_count

FROM query_p AS qp

*Explain Analyze:*

CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)

CTE query_p

-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)

SubPlan 2

-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)

-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)

Index Cond: ("clientid" = "qp"."client_id")

Filter: (NOT "deleted")

Rows Removed by Filter: 1068

Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

I would think because of the NOT "deleted" clause. Which is interesting,
because that's a column which you conveniently didn't include in the
definition below.

Show quoted text

*Table customers:*

Table "public.customers"

Column | Type |
Modifiers

------------------------+-----------------------------+-----
------------------------------------------------------------

id | bigint | not null default
"nextval"('"customers_seq"'::"regclass")

clientid | bigint | not null default 0

name_first | character varying(80) | default
''::character varying

name_last | character varying(80) | default
''::character varying

company | character varying(255) | default
''::character varying

*Index clientid_customers:*

CREATE INDEX

clientid_customers

ON

customers

(

"clientid"

);

Thanks!

Patrick

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: Patrick B (#1)

Re: Slow index scan - Pgsql 9.2

On Mon, Jan 9, 2017 at 6:06 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

*Explain Analyze:*

CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)

CTE query_p

-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)

SubPlan 2

-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)

-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)

Index Cond: ("clientid" = "qp"."client_id")

Filter: (NOT "deleted")

Rows Removed by Filter: 1068

Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

3,581 individual pokes into the heap to confirm tuple visibility and
apply the deleted filter - that could indeed take a while.

David J.

David Rowley

dgrowleyml@gmail.com

about 9 years ago

In reply to: Patrick B (#1)

Re: Slow index scan - Pgsql 9.2

On 10 January 2017 at 14:06, Patrick B <patrickbakerbr@gmail.com> wrote:

-> Index Scan using "clientid_customers" on "customers" "c" (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

EXPLAIN (ANALYZE, BUFFERS) might reveal something.

Perhaps each of the 2513 found rows, plus the 1068 filtered out rows
were spread over the table. Perhaps each on their own heap page, and
all those pages had to be read from disk. The BUFFERS option might
help show if this is the case.

Does it execute as slowly when you run it for a 2nd time?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Patrick B

patrickbakerbr@gmail.com

about 9 years ago

In reply to: David G. Johnston (#3)

Re: Slow index scan - Pgsql 9.2

3,581 individual pokes into the heap to confirm tuple visibility and
apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

I would think because of the NOT "deleted" clause. Which is interesting,

because that's a column which you conveniently didn't include in the
definition below.

My mistake.

Would an Index be sufficient to solve the problem?

Patrick

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: Patrick B (#5)

Re: Slow index scan - Pgsql 9.2

On Mon, Jan 9, 2017 at 8:05 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

3,581 individual pokes into the heap to confirm tuple visibility and

apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

Correct - your first execution swallows disk I/O.

I would think because of the NOT "deleted" clause. Which is interesting,

because that's a column which you conveniently didn't include in the
definition below.

My mistake.

Would an Index be sufficient to solve the problem?

A lot would depend on the selectivity of "deleted"... you are tossing less
than 1/3rd of the rows here so probably not that helpful - though a partial
index over deleted = false would benefit this specific query.

If this query dominates your non-PK usage of this table you could consider
clustering the table on client_id - that would reduce the I/O hit at the
expense of increased maintenance.

Really, though, the query you are showing is poorly written - so I am
assuming that it is standing in for something else. If that's so you have
provided zero context for meaningful opinions to be rendered.

Queries against the supposedly large customer table, on the
high-cardinality client_id field, which result in many matching rows is
going to be a physical problem if the data doesn't remain in cache.
CLUSTER can help a bit in that situation. Otherwise your solutions are
more likely to by physical and not logical (model/SQL).

David J.

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 9 years ago

In reply to: Patrick B (#5)

Re: Slow index scan - Pgsql 9.2

On 01/10/2017 04:05 AM, Patrick B wrote:

3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.

My mistake.

Would an Index be sufficient to solve the problem?

Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow
index-only scans, i.e. creating an index like this:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Patrick B

patrickbakerbr@gmail.com

about 9 years ago

In reply to: Tomas Vondra (#7)

Re: Slow index scan - Pgsql 9.2

2017-01-11 4:05 GMT+13:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

On 01/10/2017 04:05 AM, Patrick B wrote:

3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.

My mistake.

Would an Index be sufficient to solve the problem?

Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow index-only
scans, i.e. creating an index like this:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Thanks for the reply!

I decided to create a partial index for that query, as it is part of a much
bigger one and it is run at all the time.

Patrick