Non-deterministic 100% CPU hang on postgres 9.3
Hi all,
I'm experiencing an issue where certain queries appear to
non-deterministically "hang", with a CPU pinned at 100%.
I say "hang", where really I've given up after ~12 hours execution. The
exact same query can then be terminated and run in <90 seconds, with none
of the underlying data changing.
I can completely reset the DB (drop tables/recreate) and re-run and
sometimes certain queries will appear to hang forever, sometimes they will
execute in minutes.
I've tried official debian (amd64) packages of postgres 9.3.2 - 9.3.4 (same
issues).
The workload is ostensibly analytics - As a disclaimer I absolutely
understand that these queries may not be non-optimized (I didn't build
them/have not reviewed fully). The part I'm trying to work out is why they
sometimes finish easily in minutes and other times hang.
I've straced the process when it's pinned and there appears to be no system
calls (ie: no IO) just a pure CPU loop.
If anyone could explain this behaviour, or even how I might go about
diagnosing, that would be wonderful.
Query/analyze details follow.
Thanks!
Query:
----
UPDATE ad_events e
set ad_user_id = x.ad_user_id, ad_session_id = x.ad_session_id
FROM
(SELECT t.ad_event_id, COALESCE (s.ad_user_id, u.merged_id,u.ad_user_id)
ad_user_id,
case when (name = 'Clickthrough from Email' or properties->('mp_lib')
= 'web' ) then s.ad_session_id
else null
end as ad_session_id,
t.timestamp, name , properties
from ad_events_mv t
INNER JOIN ad_users u ON (t.user_id = u.orig_distinct_id)
LEFT OUTER JOIN ad_sessions s ON (t.timestamp between s.session_start
and s.session_end and (s.ad_user_id = u.ad_user_id or s.ad_user_id =
u.merged_id))
where t.processed = false
order by s.ad_user_id, s.ad_session_id, timestamp) x
WHERE
x.ad_event_id = e.ad_event_id;
----
EXPLAIN ANALYZE follows: QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on ad_events e (cost=39730372.92..39765372.92 rows=1000000
width=237) (actual time=130140.450..130140.450 rows=0 loops=1)
-> Hash Join (cost=39730372.92..39765372.92 rows=1000000 width=237)
(actual time=54243.877..66848.448 rows=2000000 loops=1)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=39654433.45..39666933.45
rows=1000000 width=144) (actual time=52682.740..57668.998 rows=2000000
loops=1)
-> Sort (cost=39654433.45..39656933.45 rows=1000000
width=108) (actual time=52682.693..55003.467 rows=2000000 loops=1)
Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
Sort Method: external merge Disk: 1078104kB
-> Nested Loop Left Join (cost=12054.20..39554775.61
rows=1000000 width=108) (actual time=204.872..43742.396 rows=2000000
loops=1)
Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end))
Rows Removed by Join Filter: 18792613
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92) (actual time=204.402..4293.175 rows=2000000 loops=1)
Hash Cond: (t.user_id = u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112) (actual time=0.022..1813.272
rows=2000000 loops=1)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67) (actual time=204.166..204.166 rows=329115 loops=1)
Buckets: 65536 Batches: 1 Memory
Usage: 30590kB
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67) (actual time=0.009..85.231
rows=329115 loops=1)
-> Bitmap Heap Scan on ad_sessions s
(cost=8.11..39.22 rows=8 width=32) (actual time=0.007..0.015 rows=10
loops=2000000)
Recheck Cond: ((ad_user_id = u.ad_user_id)
OR (ad_user_id = u.merged_id))
-> BitmapOr (cost=8.11..8.11 rows=8
width=0) (actual time=0.005..0.005 rows=0 loops=2000000)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0) (actual
time=0.003..0.003 rows=8 loops=2000000)
Index Cond: (ad_user_id =
u.ad_user_id)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0) (actual
time=0.001..0.001 rows=3 loops=2000000)
:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on ad_events e (cost=39730372.92..39765372.92 rows=1000000
width=237) (actual time=130140.450..130140.450 rows=0 loops=1)
-> Hash Join (cost=39730372.92..39765372.92 rows=1000000 width=237)
(actual time=54243.877..66848.448 rows=2000000 loops=1)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=39654433.45..39666933.45
rows=1000000 width=144) (actual time=52682.740..57668.998 rows=2000000
loops=1)
-> Sort (cost=39654433.45..39656933.45 rows=1000000
width=108) (actual time=52682.693..55003.467 rows=2000000 loops=1)
Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
Sort Method: external merge Disk: 1078104kB
-> Nested Loop Left Join (cost=12054.20..39554775.61
rows=1000000 width=108) (actual time=204.872..43742.396 rows=2000000
loops=1)
Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end))
Rows Removed by Join Filter: 18792613
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92) (actual time=204.402..4293.175 rows=2000000 loops=1)
Hash Cond: (t.user_id = u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112) (actual time=0.022..1813.272
rows=2000000 loops=1)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67) (actual time=204.166..204.166 rows=329115 loops=1)
Buckets: 65536 Batches: 1 Memory
Usage: 30590kB
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67) (actual time=0.009..85.231
rows=329115 loops=1)
-> Bitmap Heap Scan on ad_sessions s
(cost=8.11..39.22 rows=8 width=32) (actual time=0.007..0.015 rows=10
loops=2000000)
Recheck Cond: ((ad_user_id = u.ad_user_id)
OR (ad_user_id = u.merged_id))
-> BitmapOr (cost=8.11..8.11 rows=8
width=0) (actual time=0.005..0.005 rows=0 loops=2000000)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0) (actual
time=0.003..0.003 rows=8 loops=2000000)
Index Cond: (ad_user_id =
u.ad_user_id)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0) (actual
time=0.001..0.001 rows=3 loops=2000000)
Index Cond: (ad_user_id =
u.merged_id)
-> Hash (cost=50938.65..50938.65 rows=2000065 width=101) (actual
time=1560.037..1560.037 rows=2000000 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 253047kB
-> Seq Scan on ad_events e (cost=0.00..50938.65
rows=2000065 width=101) (actual time=0.014..633.555 rows=2000000 loops=1)
Fenn Bailey <fenn.bailey@gmail.com> writes:
I'm experiencing an issue where certain queries appear to
non-deterministically "hang", with a CPU pinned at 100%.
I say "hang", where really I've given up after ~12 hours execution. The
exact same query can then be terminated and run in <90 seconds, with none
of the underlying data changing.
But does the plan change?
If anyone could explain this behaviour, or even how I might go about
diagnosing, that would be wonderful.
"perf" or "oprofile" or local equivalent would help identify where the
looping is happening.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Some more information on this:
I say "hang", where really I've given up after ~12 hours execution. The
exact same query can then be terminated and run in <90 seconds, with none
of the underlying data changing.But does the plan change?
As it turns out it does. Some further examination shows that the plan
changes, at least partially due to the fact that I now see this query
attempts to modify the data that it's querying (ie: if any updates have
occurred, the subsequent query is not the same).
That said, all queries were performed in a transaction, so when they're
aborted, no modification should be made to the underlying data.
This makes it even more odd that on a subsequent run of the same query
(after the rollback) it will complete in ~140 seconds (vs the several hours
it was taking on the first run).
Is it possible that the initial run is changing the statistics available on
the table, hence providing a different query plan (even on the same data)?
Query plan on first/second run follows -
First run (failure):
---------------------------
Update on ad_events e (cost=1426714795.64..1426749795.64 rows=1000000
width=237)
-> Hash Join (cost=1426714795.64..1426749795.64 rows=1000000 width=237)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=1426638856.33..1426651356.33
rows=1000000 width=144)
-> Sort (cost=1426638856.33..1426641356.33 rows=1000000
width=108)
Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
-> Nested Loop Left Join
(cost=12046.09..1426539198.49 rows=1000000 width=108)
Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id = u.ad_user_id) OR
(s.ad_user_id = u.merged_id)))
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92)
Hash Cond: (t.user_id = u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67)
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67)
-> Materialize (cost=0.00..5478.88 rows=63392
width=32)
-> Seq Scan on ad_sessions s
(cost=0.00..5161.92 rows=63392 width=32)
-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)
-> Seq Scan on ad_events e (cost=0.00..50938.58
rows=2000058 width=101)
Second run (completed in ~148 seconds):
------------------------------
Update on ad_events e (cost=39730372.76..39765372.76 rows=1000000
width=237)
-> Hash Join (cost=39730372.76..39765372.76 rows=1000000 width=237)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=39654433.45..39666933.45
rows=1000000 width=144)
-> Sort (cost=39654433.45..39656933.45 rows=1000000
width=108)
Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
-> Nested Loop Left Join (cost=12054.20..39554775.61
rows=1000000 width=108)
Join Filter: ((t."timestamp" >= s.session_start)
AND (t."timestamp" <= s.session_end))
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92)
Hash Cond: (t.user_id = u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67)
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67)
-> Bitmap Heap Scan on ad_sessions s
(cost=8.11..39.22 rows=8 width=32)
Recheck Cond: ((ad_user_id = u.ad_user_id)
OR (ad_user_id = u.merged_id))
-> BitmapOr (cost=8.11..8.11 rows=8
width=0)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)
Index Cond: (ad_user_id =
u.ad_user_id)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)
Index Cond: (ad_user_id =
u.merged_id)
-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)
-> Seq Scan on ad_events e (cost=0.00..50938.58
rows=2000058 width=101)
---------------
From a quick look, the plans diverge at:
-> Seq Scan on ad_sessions s (cost=0.00..5161.92 rows=63392 width=32)
vs
-> Bitmap Heap Scan on ad_sessions s (cost=8.11..39.22 rows=8 width=32)
I will continue to look further into this in an effort to "fix" the queries
but it would be great if someone could give me insight as to what could be
causing the inconsistency in plan.
Cheers!
Hi all,
I was wondering if anyone had any further insight as to why the query plan
would have changed so dramatically between queries, even when isolated by a
transaction?
Is it possible for database statistics to change after a query has been run
(independent of transactions)?
Thanks in advance,
Fenn.
On Wed, Apr 16, 2014 at 2:25 PM, Fenn Bailey <fenn.bailey@gmail.com> wrote:
Show quoted text
Some more information on this:
I say "hang", where really I've given up after ~12 hours execution. The
exact same query can then be terminated and run in <90 seconds, withnone
of the underlying data changing.
But does the plan change?
As it turns out it does. Some further examination shows that the plan
changes, at least partially due to the fact that I now see this query
attempts to modify the data that it's querying (ie: if any updates have
occurred, the subsequent query is not the same).That said, all queries were performed in a transaction, so when they're
aborted, no modification should be made to the underlying data.This makes it even more odd that on a subsequent run of the same query
(after the rollback) it will complete in ~140 seconds (vs the several hours
it was taking on the first run).Is it possible that the initial run is changing the statistics available
on the table, hence providing a different query plan (even on the same
data)?Query plan on first/second run follows -
First run (failure):
---------------------------
Update on ad_events e (cost=1426714795.64..1426749795.64 rows=1000000
width=237)
-> Hash Join (cost=1426714795.64..1426749795.64 rows=1000000
width=237)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=1426638856.33..1426651356.33
rows=1000000 width=144)
-> Sort (cost=1426638856.33..1426641356.33 rows=1000000
width=108)
Sort Key: s.ad_user_id, s.ad_session_id,
t."timestamp"
-> Nested Loop Left Join
(cost=12046.09..1426539198.49 rows=1000000 width=108)
Join Filter: ((t."timestamp" >=
s.session_start) AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id =
u.ad_user_id) OR (s.ad_user_id = u.merged_id)))
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92)
Hash Cond: (t.user_id =
u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67)
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67)
-> Materialize (cost=0.00..5478.88 rows=63392
width=32)
-> Seq Scan on ad_sessions s
(cost=0.00..5161.92 rows=63392 width=32)
-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)
-> Seq Scan on ad_events e (cost=0.00..50938.58
rows=2000058 width=101)Second run (completed in ~148 seconds):
------------------------------
Update on ad_events e (cost=39730372.76..39765372.76 rows=1000000
width=237)
-> Hash Join (cost=39730372.76..39765372.76 rows=1000000 width=237)
Hash Cond: (x.ad_event_id = e.ad_event_id)
-> Subquery Scan on x (cost=39654433.45..39666933.45
rows=1000000 width=144)
-> Sort (cost=39654433.45..39656933.45 rows=1000000
width=108)
Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
-> Nested Loop Left Join
(cost=12054.20..39554775.61 rows=1000000 width=108)
Join Filter: ((t."timestamp" >=
s.session_start) AND (t."timestamp" <= s.session_end))
-> Hash Join (cost=12046.09..203878.09
rows=1000000 width=92)
Hash Cond: (t.user_id =
u.orig_distinct_id)
-> Seq Scan on ad_events_mv t
(cost=0.00..173082.00 rows=1000000 width=112)
Filter: (NOT processed)
-> Hash (cost=7932.15..7932.15
rows=329115 width=67)
-> Seq Scan on ad_users u
(cost=0.00..7932.15 rows=329115 width=67)
-> Bitmap Heap Scan on ad_sessions s
(cost=8.11..39.22 rows=8 width=32)
Recheck Cond: ((ad_user_id =
u.ad_user_id) OR (ad_user_id = u.merged_id))
-> BitmapOr (cost=8.11..8.11 rows=8
width=0)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)
Index Cond: (ad_user_id =
u.ad_user_id)
-> Bitmap Index Scan on
ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)
Index Cond: (ad_user_id =
u.merged_id)
-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)
-> Seq Scan on ad_events e (cost=0.00..50938.58
rows=2000058 width=101)
---------------From a quick look, the plans diverge at:
-> Seq Scan on ad_sessions s (cost=0.00..5161.92 rows=63392 width=32)
vs
-> Bitmap Heap Scan on ad_sessions s (cost=8.11..39.22 rows=8 width=32)I will continue to look further into this in an effort to "fix" the
queries but it would be great if someone could give me insight as to what
could be causing the inconsistency in plan.Cheers!