Unpredictable shark slowdown after migrating to 8.4

Started by Sergey Konoplevover 16 years ago11 messageshackers

gray.ru@gmail.com

over 16 years ago

Hello community,

Second time after migration 8.3.7 --> 8.4.1 I was caught by this
problem. Migration was 8 days ago.
(note, I never seen such situation on 8.3)

Environment:

PostgreSQL 8.4.1 + patch
http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php
CentOS release 5.4 (Final)
SunFire X4600 M2; 3U; 8xOpteron 8380 2.5GHz; 96GB; 12x146GB 15K RPM
DB size ~90G TPS~1000

Symptoms:

In short period of time total query execution time increases incredibly.

Time Sum duration (ms)
17:17 12811
17:18 8870
17:19 33744
17:20 9991
17:21 13392
17:22 163928
17:23 1151709
17:24 12112797 <-- here it cuts due to connection threshold
17:25 12305390
17:26 12970853
17:27 12957648

LA changes rather insignificantly from ~6 to ~8.
CPU user time increases from ~350 to ~600 ), system from ~50 to ~250.
Neither additional significant disc activity nor iowait.
Another thing I noticed is autoanalyze finish on tables that few
minutes later were used in the first and mostly canceled by timeout
queries.

First time I assigned the blame to multiply locks on one of the
mentioned above tables. There was heavy delete. I started delete rows
manually by small batches and found a record that hung my delete for a
long time (many times greater then stmt timeout) even when I tried to
delete it by PK. Situation was saved by disabling some functional that
uses this table until next day when I got rid of the aggressive
deletes.

Second time I didn't find any reason that explains the situation.

Was this situation mentioned before and is there a solution or
workaround? (I didn't find any) If not please give me a glue where to
dig or what information should I provide?

Thank you.

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Sergey Konoplev (#1)

Re: Unpredictable shark slowdown after migrating to 8.4

On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Was this situation mentioned before and is there a solution or
workaround? (I didn't find any) If not please give me a glue where to
dig or what information should I provide?

I think you should use log_min_duration_statement or auto_explain to
figure out which queries are giving you grief. I don't believe that
8.4 is in general slower than 8.3, so there must be something about
how you are using it that is making it slower for you. But without
more information it's hard to speculate what it might be. It's also
not entirely clear that your 8.4 setup is really the same as your 8.3
setup. You might have different configuration, differences in your
tables or table data, differences in workload, etc. Without
controlling for all those factors it's hard to draw any conclusions.

Also, I don't believe this is an appropriate topic for pgsql-hackers.
If you have EXPLAIN ANALYZE results for the affected queries, try
pgsql-performance.

...Robert

Sergey Konoplev

gray.ru@gmail.com

over 16 years ago

In reply to: Robert Haas (#2)

Re: Unpredictable shark slowdown after migrating to 8.4

On Thu, Nov 12, 2009 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Was this situation mentioned before and is there a solution or
workaround? (I didn't find any) If not please give me a glue where to
dig or what information should I provide?

I think you should use log_min_duration_statement or auto_explain to
figure out which queries are giving you grief. I don't believe that
8.4 is in general slower than 8.3, so there must be something about
how you are using it that is making it slower for you. But without
more information it's hard to speculate what it might be. It's also
not entirely clear that your 8.4 setup is really the same as your 8.3
setup. You might have different configuration, differences in your
tables or table data, differences in workload, etc. Without
controlling for all those factors it's hard to draw any conclusions.

Well I turned on log_min_duration_statement, set up auto_explain,
pg_stat_statements, pgrowlocks, pg_buffercache, wrote some monitoring
queries and started to wait when the situation repeats.

Today it happens! Situation was absolutely the same as I described in
my previous letter. One more thing I noticed about CPU user-time this
time is that after connections count gets close to pgbouncer threshold
it decreased from ~800 to ~10 very fast.

Log monitoring shows me that query plans were the same as usual (thanx
auto_explain).

I reset pg_stat_statements and few minutes later did select from it. I
noticed that slowest queries was on tables with high number of updates
(but isn't it usual?).

I tried to get locks with this queries

SELECT
t.tablename,
(SELECT count(*) FROM pgrowlocks(schemaname || '.' || t.tablename)) AS locks
FROM pg_tables t
WHERE schemaname = 'public'
ORDER BY 2 DESC
LIMIT 10;

SELECT * FROM pgrowlocks('public.person_online');

but nothing was returned.

Here is portions of vmstat and iostat results http://pastie.org/701326

This time situation was saved by PG restart to. Obviously all I
provided tells almost nothing and I'm very confused with it. So please
tell me what else could I do to get more info?

Also, I don't believe this is an appropriate topic for pgsql-hackers.
If you have EXPLAIN ANALYZE results for the affected queries, try
pgsql-performance.

I do have but this results are good and the same as when nothing has
happened when everything is allright.

--
Regards,
Sergey Konoplev

Alvaro Herrera

alvherre@2ndquadrant.com

over 16 years ago

In reply to: Sergey Konoplev (#3)

Re: Unpredictable shark slowdown after migrating to 8.4

Sergey Konoplev escribiï¿½:

I tried to get locks with this queries

Did you try pg_locks?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Robert Haas

robertmhaas@gmail.com

over 16 years ago

In reply to: Sergey Konoplev (#3)

Re: Unpredictable shark slowdown after migrating to 8.4

On Mon, Nov 16, 2009 at 1:53 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Thu, Nov 12, 2009 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Was this situation mentioned before and is there a solution or
workaround? (I didn't find any) If not please give me a glue where to
dig or what information should I provide?

I think you should use log_min_duration_statement or auto_explain to
figure out which queries are giving you grief. I don't believe that
8.4 is in general slower than 8.3, so there must be something about
how you are using it that is making it slower for you. But without
more information it's hard to speculate what it might be. It's also
not entirely clear that your 8.4 setup is really the same as your 8.3
setup. You might have different configuration, differences in your
tables or table data, differences in workload, etc. Without
controlling for all those factors it's hard to draw any conclusions.

Well I turned on log_min_duration_statement, set up auto_explain,
pg_stat_statements, pgrowlocks, pg_buffercache, wrote some monitoring
queries and started to wait when the situation repeats.

Today it happens! Situation was absolutely the same as I described in
my previous letter. One more thing I noticed about CPU user-time this
time is that after connections count gets close to pgbouncer threshold
it decreased from ~800 to ~10 very fast.

Log monitoring shows me that query plans were the same as usual (thanx
auto_explain).

I reset pg_stat_statements and few minutes later did select from it. I
noticed that slowest queries was on tables with high number of updates
(but isn't it usual?).

I tried to get locks with this queries

SELECT
t.tablename,
(SELECT count(*) FROM pgrowlocks(schemaname || '.' || t.tablename)) AS locks
FROM pg_tables t
WHERE schemaname = 'public'
ORDER BY 2 DESC
LIMIT 10;

SELECT * FROM pgrowlocks('public.person_online');

but nothing was returned.

Here is portions of vmstat and iostat results http://pastie.org/701326

This time situation was saved by PG restart to. Obviously all I
provided tells almost nothing and I'm very confused with it. So please
tell me what else could I do to get more info?

Also, I don't believe this is an appropriate topic for pgsql-hackers.
If you have EXPLAIN ANALYZE results for the affected queries, try
pgsql-performance.

I do have but this results are good and the same as when nothing has
happened when everything is allright.

Can you show us the non-commented settings from your postgresql.conf?

Can you show us what the vmstat output looks like when everything is
running normally? It looks like the blocks out are pretty high, but I
don't know how that compares to normal for you.

...Robert

Andres Freund

andres@anarazel.de

over 16 years ago

In reply to: Sergey Konoplev (#1)

Re: Unpredictable shark slowdown after migrating to 8.4

On Wednesday 11 November 2009 18:50:46 Sergey Konoplev wrote:

Hello community,

Second time after migration 8.3.7 --> 8.4.1 I was caught by this
problem. Migration was 8 days ago.
(note, I never seen such situation on 8.3)

Is 8.4 configured similarly to 8.3?

Andres

Sergey Konoplev

gray.ru@gmail.com

over 16 years ago

In reply to: Alvaro Herrera (#4)

Re: Unpredictable shark slowdown after migrating to 8.4

On Mon, Nov 16, 2009 at 9:56 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Sergey Konoplev escribió:

I tried to get locks with this queries

Did you try pg_locks?

I tried monitor locks with pgrowlocks. Isn't it better way? If it
isn't what points should I pay attention with pg_lock?

I've just write the query

SELECT pid, count(1) AS locks, current_query
FROM pg_locks AS l LEFT JOIN pg_stat_activity AS a ON pid = procpid
WHERE l.mode IN ('RowExclusiveLock', 'ShareUpdateExclusiveLock',
'ExclusiveLock')
GROUP BY 1,3 ORDER BY 2 DESC LIMIT 10;

would it be what we need?

--
Regards,
Sergey Konoplev

Sergey Konoplev

gray.ru@gmail.com

over 16 years ago

In reply to: Robert Haas (#5)

Re: Unpredictable shark slowdown after migrating to 8.4

Can you show us the non-commented settings from your postgresql.conf?

Working postgresql.conf http://pastie.org/702748

Can you show us what the vmstat output looks like when everything is
running normally? It looks like the blocks out are pretty high, but I
don't know how that compares to normal for you.

Here it is http://pastie.org/702742

--
Regards,
Sergey Konoplev

Sergey Konoplev

gray.ru@gmail.com

over 16 years ago

In reply to: Andres Freund (#6)

Re: Unpredictable shark slowdown after migrating to 8.4

On Mon, Nov 16, 2009 at 10:17 PM, Andres Freund <andres@anarazel.de> wrote:

On Wednesday 11 November 2009 18:50:46 Sergey Konoplev wrote:

Hello community,

Second time after migration 8.3.7 --> 8.4.1 I was caught by this
problem. Migration was 8 days ago.
(note, I never seen such situation on 8.3)

Is 8.4 configured similarly to 8.3?

It is.

8.3 conf - http://pastie.org/702752
8.4 conf - http://pastie.org/702748

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/

#10

Alvaro Herrera

alvherre@2ndquadrant.com

over 16 years ago

In reply to: Sergey Konoplev (#7)

Re: Unpredictable shark slowdown after migrating to 8.4

Sergey Konoplev escribiï¿½:

On Mon, Nov 16, 2009 at 9:56 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Sergey Konoplev escribiï¿½:

I tried to get locks with this queries

Did you try pg_locks?

I tried monitor locks with pgrowlocks. Isn't it better way? If it
isn't what points should I pay attention with pg_lock?

pgrowlocks shows row locks only. pg_locks shows all regular locks, i.e.
locks on tables, pages, transactions, etc. You should be concerned with
pgrowlocks only if you see a transaction waiting for locktype=transaction.

I've just write the query

SELECT pid, count(1) AS locks, current_query
FROM pg_locks AS l LEFT JOIN pg_stat_activity AS a ON pid = procpid
WHERE l.mode IN ('RowExclusiveLock', 'ShareUpdateExclusiveLock',
'ExclusiveLock')
GROUP BY 1,3 ORDER BY 2 DESC LIMIT 10;

Why only those modes? I'd search for locks with granted=false, then see
all the other locks held by the process that's holding the conflicting
lock with granted=true (i.e. the one you're waiting on).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11

Sergey Konoplev

gray.ru@gmail.com

over 16 years ago

In reply to: Alvaro Herrera (#10)

Re: Unpredictable shark slowdown after migrating to 8.4

Thank you for the hints.

Why only those modes? I'd search for locks with granted=false, then see
all the other locks held by the process that's holding the conflicting
lock with granted=true (i.e. the one you're waiting on).

Something like this?

SELECT
granted,
pid,
virtualxid,
transactionid,
virtualtransaction,
count(1) AS locks,
current_query
FROM
pg_locks AS l
LEFT JOIN pg_stat_activity AS a ON
pid = procpid
GROUP BY 1, 2, 3, 4, 5, 7
ORDER BY 1, 6 DESC;

And two more queries to do extended analysis of its results after restarting PG:

SELECT
pg_stat_activity.datname,
pg_class.relname,
pg_locks.transactionid,
pg_locks.mode,
pg_locks.granted,
pg_stat_activity.usename,
pg_stat_activity.current_query,
pg_stat_activity.query_start,
age(now(),pg_stat_activity.query_start) AS "age",
pg_stat_activity.procpid
FROM
pg_stat_activity,
pg_locks
LEFT OUTER JOIN pg_class ON
pg_locks.relation = pg_class.oid
WHERE
pg_locks.pid = pg_stat_activity.procpid
ORDER BY
query_start;

SELECT * FROM pg_locks;

Are there another things I should do when the problem rise up again?

--
Regards,
Sergey Konoplev