Slow index scan - Pgsql 9.2
Hi guys,
I've got the following Query:
WITH
query_p AS (
SELECT CAST(6667176 AS
BIGINT) AS client_id),
clients AS (
SELECT
client.id
,client.job_share_mode
FROM
customers AS
client
WHERE
(client.clientid
= (SELECT qp.client_id FROM query_p AS qp))
AND
NOT
client.is_demo
AND
NOT
client.deleted
)
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid
= qp.client_id) AND NOT c.deleted) AS client_count
FROM query_p AS qp
*Explain Analyze:*
CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)
CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)
SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)
-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms
Why a search for "client_id" is so slow??
*Table customers:*
Table "public.customers"
Column | Type |
Modifiers
------------------------+-----------------------------+-----------------------------------------------------------------
id | bigint | not null default
"nextval"('"customers_seq"'::"regclass")
clientid | bigint | not null default 0
name_first | character varying(80) | default
''::character varying
name_last | character varying(80) | default
''::character varying
company | character varying(255) | default
''::character varying
*Index clientid_customers:*
CREATE INDEX
clientid_customers
ON
customers
(
"clientid"
);
Thanks!
Patrick
Hi guys,
I've got the following Query:
WITH
query_p AS (
SELECT CAST(6667176 AS
BIGINT) AS client_id),clients AS (
SELECT
client.id
,client.job_share_modeFROM
customers AS
clientWHERE
(client.clientid = (SELECT qp.client_id FROM query_p AS qp))
AND
NOT
client.is_demoAND
NOT
client.deleted)
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE
(c.clientid = qp.client_id) AND NOT c.deleted) AS client_countFROM query_p AS qp
*Explain Analyze:*
CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms
Why a search for "client_id" is so slow??
I would think because of the NOT "deleted" clause. Which is interesting,
because that's a column which you conveniently didn't include in the
definition below.
Show quoted text
*Table customers:*
Table "public.customers"
Column | Type |
Modifiers------------------------+-----------------------------+-----
------------------------------------------------------------id | bigint | not null default
"nextval"('"customers_seq"'::"regclass")clientid | bigint | not null default 0
name_first | character varying(80) | default
''::character varyingname_last | character varying(80) | default
''::character varyingcompany | character varying(255) | default
''::character varying*Index clientid_customers:*
CREATE INDEX
clientid_customers
ON
customers
(
"clientid"
);
Thanks!
Patrick
On Mon, Jan 9, 2017 at 6:06 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
*Explain Analyze:*
CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual
time=4065.244..4065.246 rows=1 loops=1)CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003
rows=1 loops=1)SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual
time=4065.229..4065.229 rows=1 loops=1)-> Index Scan using "clientid_customers" on "customers" "c"
(cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728
rows=2513 loops=1)Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms
Why a search for "client_id" is so slow??
3,581 individual pokes into the heap to confirm tuple visibility and
apply the deleted filter - that could indeed take a while.
David J.
On 10 January 2017 at 14:06, Patrick B <patrickbakerbr@gmail.com> wrote:
-> Index Scan using "clientid_customers" on "customers" "c" (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 msWhy a search for "client_id" is so slow??
EXPLAIN (ANALYZE, BUFFERS) might reveal something.
Perhaps each of the 2513 found rows, plus the 1068 filtered out rows
were spread over the table. Perhaps each on their own heap page, and
all those pages had to be read from disk. The BUFFERS option might
help show if this is the case.
Does it execute as slowly when you run it for a 2nd time?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
3,581 individual pokes into the heap to confirm tuple visibility and
apply the deleted filter - that could indeed take a while.
David J.
I see.. The deleted column is:
deleted boolean
Should I create an index for that? How could I improve this query?
Does it execute as slowly when you run it for a 2nd time?
No, it doesn't. I think it's because of cache?
I would think because of the NOT "deleted" clause. Which is interesting,
because that's a column which you conveniently didn't include in the
definition below.
My mistake.
Would an Index be sufficient to solve the problem?
Patrick
On Mon, Jan 9, 2017 at 8:05 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
3,581 individual pokes into the heap to confirm tuple visibility and
apply the deleted filter - that could indeed take a while.
David J.I see.. The deleted column is:
deleted boolean
Should I create an index for that? How could I improve this query?
Does it execute as slowly when you run it for a 2nd time?
No, it doesn't. I think it's because of cache?
Correct - your first execution swallows disk I/O.
I would think because of the NOT "deleted" clause. Which is interesting,
because that's a column which you conveniently didn't include in the
definition below.My mistake.
Would an Index be sufficient to solve the problem?
A lot would depend on the selectivity of "deleted"... you are tossing less
than 1/3rd of the rows here so probably not that helpful - though a partial
index over deleted = false would benefit this specific query.
If this query dominates your non-PK usage of this table you could consider
clustering the table on client_id - that would reduce the I/O hit at the
expense of increased maintenance.
Really, though, the query you are showing is poorly written - so I am
assuming that it is standing in for something else. If that's so you have
provided zero context for meaningful opinions to be rendered.
Queries against the supposedly large customer table, on the
high-cardinality client_id field, which result in many matching rows is
going to be a physical problem if the data doesn't remain in cache.
CLUSTER can help a bit in that situation. Otherwise your solutions are
more likely to by physical and not logical (model/SQL).
David J.
On 01/10/2017 04:05 AM, Patrick B wrote:
3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.I see.. The deleted column is:
deleted boolean
Should I create an index for that? How could I improve this query?
Does it execute as slowly when you run it for a 2nd time?
No, it doesn't. I think it's because of cache?
I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.My mistake.
Would an Index be sufficient to solve the problem?
Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow
index-only scans, i.e. creating an index like this:
CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
This will make the index larger, but it should allow index-only scans.
The other thing you could try is partial index, i.e.
CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;
You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:
CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
WHERE NOT is_demo AND NOT deleted;
I'd bet all of those will outperform the current plan.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2017-01-11 4:05 GMT+13:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 01/10/2017 04:05 AM, Patrick B wrote:
3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.I see.. The deleted column is:
deleted boolean
Should I create an index for that? How could I improve this query?
Does it execute as slowly when you run it for a 2nd time?
No, it doesn't. I think it's because of cache?
I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.My mistake.
Would an Index be sufficient to solve the problem?
Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow index-only
scans, i.e. creating an index like this:CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
This will make the index larger, but it should allow index-only scans.
The other thing you could try is partial index, i.e.
CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;
You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
WHERE NOT is_demo AND NOT deleted;I'd bet all of those will outperform the current plan.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for the reply!
I decided to create a partial index for that query, as it is part of a much
bigger one and it is run at all the time.
Patrick