Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)
Hello Postgres users,
I have a Linux website with phpBB serving a small Facebook game:
# uname -a
Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS release 5.5 (Final)
# rpm -qa | grep -i postgres
postgresql-devel-8.1.21-1.el5_5.1
postgresql-8.1.21-1.el5_5.1
postgresql-server-8.1.21-1.el5_5.1
postgresql-devel-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-docs-8.1.21-1.el5_5.1
It works generally ok and with little load since June.
(My game has less than 100 users and I have AMD/Quad+4GB)
On Friday I've installed a cronjob (full source code at the bottom):
3 3 * * * psql -a -f
$HOME/bin/clean-phpbb-forum.sql
and on Monday I've found out, that the site is overloaded:
top - 10:16:56 up 3 days, 23:56, 1 user, load average: 20.55, 21.38, 22.92
Tasks: 157 total, 24 running, 133 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.9%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4019028k total, 3073968k used, 945060k free, 50604k buffers
Swap: 2104496k total, 132k used, 2104364k free, 2316196k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28974 postgres 16 0 122m 14m 10m R 73.3 0.4 20:56.06 postmaster
28727 postgres 16 0 121m 13m 10m R 54.7 0.3 37:58.58 postmaster
28714 postgres 16 0 122m 14m 10m R 50.4 0.4 38:38.98 postmaster
29412 postgres 16 0 121m 13m 10m R 46.5 0.4 8:03.96 postmaster
28542 postgres 16 0 122m 14m 10m R 46.2 0.4 58:49.38 postmaster
28482 postgres 16 0 122m 14m 10m R 45.8 0.4 61:53.37 postmaster
28468 postgres 16 0 122m 14m 10m R 44.2 0.4 62:46.17 postmaster
29017 postgres 16 0 122m 14m 10m R 43.9 0.4 19:17.06 postmaster
28929 postgres 15 0 122m 14m 10m R 42.2 0.4 22:01.43 postmaster
28500 postgres 16 0 122m 14m 10m R 41.3 0.4 59:40.23 postmaster
28460 postgres 16 0 122m 14m 10m R 40.6 0.4 64:17.16 postmaster
28894 postgres 16 0 122m 14m 10m R 38.6 0.4 23:35.53 postmaster
28489 postgres 16 0 122m 14m 10m R 36.0 0.4 60:32.59 postmaster
28719 postgres 15 0 121m 13m 10m R 25.2 0.3 38:10.33 postmaster
29496 postgres 16 0 121m 13m 10m R 22.9 0.4 4:20.32 postmaster
28556 postgres 15 0 122m 14m 10m R 17.7 0.4 57:32.62 postmaster
28735 postgres 15 0 122m 14m 10m R 15.7 0.4 36:09.45 postmaster
29602 postgres 15 0 119m 11m 9680 S 8.2 0.3 0:00.25 postmaster
28457 postgres 17 0 122m 14m 9m R 3.6 0.4 64:34.38 postmaster
26092 apache 15 0 238m 16m 3740 S 0.3 0.4 0:03.38 httpd
29596 afarber 15 0 12744 1116 800 R 0.3 0.0 0:00.09 top
1 root 15 0 10352 700 592 S 0.0 0.0 0:01.69 init
I understand, that I probably supply not enough info,
but how would you approach debugging this problem?
I've run my cronjob numerous times from CLI - it works ok
and takes only few seconds. I've installed it as an every-3 mins
cronjob on my development VM - it works ok there too.
My biggest problem is, that I don't see any information
from Postgres at the production site - why did it have to start
so many postmaster's (whatever those processes do).
The only log file I've found has been /var/lib/pgsql/pgstartup.log
and there is nothing suspicious there. (Also nothing related to Postgres
in messages, mcelog (empty), audit.log, access_log, error_log).
Please give me some hints
Alex
# cat bin/clean-phpbb-forum.sql
start transaction;
delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';
create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;
insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
-- select count(topic_id) as "old topics:" from old_topics;
insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
-- select count(post_id) as "old posts:" from old_posts;
delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);
update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';
update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';
update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;
update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;
commit
Dear Farber,
Is there anything that stops you from upgrading to a more respectable
release number?
Eg 9.0 or at least latest 8.4.X
Regds
Rajesh Kumar Mallah.
Alexander Farber skrev 2010-10-04 11.20:
Hello Postgres users,
I have a Linux website with phpBB serving a small Facebook game:
# uname -a
Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS release 5.5 (Final)
# rpm -qa | grep -i postgres
postgresql-devel-8.1.21-1.el5_5.1
postgresql-8.1.21-1.el5_5.1
postgresql-server-8.1.21-1.el5_5.1
postgresql-devel-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-docs-8.1.21-1.el5_5.1It works generally ok and with little load since June.
(My game has less than 100 users and I have AMD/Quad+4GB)On Friday I've installed a cronjob (full source code at the bottom):
3 3 * * * psql -a -f
$HOME/bin/clean-phpbb-forum.sqland on Monday I've found out, that the site is overloaded:
top - 10:16:56 up 3 days, 23:56, 1 user, load average: 20.55, 21.38, 22.92
Tasks: 157 total, 24 running, 133 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.9%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4019028k total, 3073968k used, 945060k free, 50604k buffers
Swap: 2104496k total, 132k used, 2104364k free, 2316196k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28974 postgres 16 0 122m 14m 10m R 73.3 0.4 20:56.06 postmaster
28727 postgres 16 0 121m 13m 10m R 54.7 0.3 37:58.58 postmaster
28714 postgres 16 0 122m 14m 10m R 50.4 0.4 38:38.98 postmaster
29412 postgres 16 0 121m 13m 10m R 46.5 0.4 8:03.96 postmaster
28542 postgres 16 0 122m 14m 10m R 46.2 0.4 58:49.38 postmaster
28482 postgres 16 0 122m 14m 10m R 45.8 0.4 61:53.37 postmaster
28468 postgres 16 0 122m 14m 10m R 44.2 0.4 62:46.17 postmaster
29017 postgres 16 0 122m 14m 10m R 43.9 0.4 19:17.06 postmaster
28929 postgres 15 0 122m 14m 10m R 42.2 0.4 22:01.43 postmaster
28500 postgres 16 0 122m 14m 10m R 41.3 0.4 59:40.23 postmaster
28460 postgres 16 0 122m 14m 10m R 40.6 0.4 64:17.16 postmaster
28894 postgres 16 0 122m 14m 10m R 38.6 0.4 23:35.53 postmaster
28489 postgres 16 0 122m 14m 10m R 36.0 0.4 60:32.59 postmaster
28719 postgres 15 0 121m 13m 10m R 25.2 0.3 38:10.33 postmaster
29496 postgres 16 0 121m 13m 10m R 22.9 0.4 4:20.32 postmaster
28556 postgres 15 0 122m 14m 10m R 17.7 0.4 57:32.62 postmaster
28735 postgres 15 0 122m 14m 10m R 15.7 0.4 36:09.45 postmaster
29602 postgres 15 0 119m 11m 9680 S 8.2 0.3 0:00.25 postmaster
28457 postgres 17 0 122m 14m 9m R 3.6 0.4 64:34.38 postmaster
26092 apache 15 0 238m 16m 3740 S 0.3 0.4 0:03.38 httpd
29596 afarber 15 0 12744 1116 800 R 0.3 0.0 0:00.09 top
1 root 15 0 10352 700 592 S 0.0 0.0 0:01.69 initI understand, that I probably supply not enough info,
but how would you approach debugging this problem?
I would connect to the db with psql and query the pg_stat_activity and
the pg_locks views.
I've run my cronjob numerous times from CLI - it works ok
and takes only few seconds. I've installed it as an every-3 mins
cronjob on my development VM - it works ok there too.My biggest problem is, that I don't see any information
from Postgres at the production site - why did it have to start
so many postmaster's (whatever those processes do).
Those are backends running queries.
The only log file I've found has been /var/lib/pgsql/pgstartup.log
and there is nothing suspicious there. (Also nothing related to Postgres
in messages, mcelog (empty), audit.log, access_log, error_log).
You might want to increase logging. Take a look at
http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html
for details.
Regards
/roppert
Show quoted text
Please give me some hints
Alex# cat bin/clean-phpbb-forum.sql
start transaction;delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';-- select count(topic_id) as "old topics:" from old_topics;
insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;-- select count(post_id) as "old posts:" from old_posts;
delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;commit
Hello again,
ouch, I have these postmaster's spinning up again -
even though I've disabled the cronjob and rebooted:
top - 11:42:31 up 1:22, 2 users, load average: 9.15, 7.59, 5.03
Tasks: 135 total, 10 running, 125 sleeping, 0 stopped, 0 zombie
Cpu0 : 31.7%us, 0.5%sy, 0.0%ni, 67.4%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 32.9%us, 0.7%sy, 0.0%ni, 65.8%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 58.8%us, 1.3%sy, 0.0%ni, 38.2%id, 1.3%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 21.7%us, 0.5%sy, 0.0%ni, 77.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1512432k used, 2506596k free, 21828k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1152844k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3208 postgres 17 0 122m 14m 9m R 143.9 0.4 30:22.79 postmaster
3193 postgres 18 0 122m 14m 9m R 117.7 0.4 34:04.08 postmaster
3263 postgres 18 0 120m 13m 9m R 109.4 0.3 11:06.24 postmaster
3315 postgres 17 0 121m 13m 9.9m R 90.7 0.4 1:59.89 postmaster
3275 postgres 18 0 121m 13m 9m R 83.2 0.4 6:44.74 postmaster
3243 postgres 18 0 122m 14m 9m R 69.0 0.4 20:04.06 postmaster
3233 postgres 17 0 122m 14m 9m R 47.2 0.4 25:01.18 postmaster
3256 postgres 18 0 122m 14m 9m R 35.2 0.4 14:48.54 postmaster
On Mon, Oct 4, 2010 at 11:34 AM, Robert Gravsjö <robert@blogg.se> wrote:
I would connect to the db with psql and query the pg_stat_activity and the
pg_locks views.
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref | <command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
16384 | pref | 3193 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:17:40.081829+02 | |
-1
16384 | pref | 3208 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:20:18.763179+02 | |
-1
16384 | pref | 3233 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:22:50.818701+02 | |
-1
16384 | pref | 3243 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:25:34.110573+02 | |
-1
16384 | pref | 3256 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:28:33.355689+02 | |
-1
16384 | pref | 3263 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:32:02.116546+02 | |
-1
16384 | pref | 3275 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:35:31.611361+02 | |
-1
16384 | pref | 3315 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:40:00.119433+02 | |
-1
16384 | pref | 3355 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:42:51.827796+02 | |
-1
16384 | pref | 3356 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:43:04.285152+02 | |
-1
(11 rows)
pref=> select * from pg_locks;
locktype | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction | pid | mode |
granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------------+---------
relation | 16384 | 17755 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
transactionid | | | | | 5677313 |
| | | 5677313 | 3243 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 10342 | | | |
| | | 5677823 | 3355 | AccessShareLock | t
transactionid | | | | | 5677234 |
| | | 5677234 | 3233 | ExclusiveLock | t
relation | 16384 | 17716 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
transactionid | | | | | 5677167 |
| | | 5677167 | 3208 | ExclusiveLock | t
relation | 16384 | 17716 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
transactionid | | | | | 5677070 |
| | | 5677070 | 3193 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
transactionid | | | | | 5677823 |
| | | 5677823 | 3355 | ExclusiveLock | t
transactionid | | | | | 5677454 |
| | | 5677454 | 3263 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
transactionid | | | | | 5677742 |
| | | 5677742 | 3315 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
transactionid | | | | | 5677395 |
| | | 5677395 | 3256 | ExclusiveLock | t
relation | 16384 | 17721 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
transactionid | | | | | 5677546 |
| | | 5677546 | 3275 | ExclusiveLock | t
relation | 16384 | 17313 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
(50 rows)
Regards
Alex
I wish I could see those hanging queries, what SQL do they try to execute:
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref | <command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
.........
I have usually just one Postgres connection from my game
(a perl script running as daemon). And then I have several
connections from phpBB. But when I restart httpd,
the spinning postmaster's don't disappear:
top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25
Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie
Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1199036k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster
3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster
3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster
3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster
3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster
3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster
3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster
3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster
On 4/10/2010 5:20 PM, Alexander Farber wrote:
# rpm -qa | grep -i postgres
postgresql-8.1.21-1.el5_5.1
Mmm, elderly.
On Friday I've installed a cronjob (full source code at the bottom):
3 3 * * * psql -a -f
$HOME/bin/clean-phpbb-forum.sql
Do you rate-control it or provide any sort of interlocking so that you
don't land up with multiple copies running at once?
See "man 2 flock" for a convenient way to do that reliably from the shell.
My biggest problem is, that I don't see any information
from Postgres at the production site - why did it have to start
so many postmaster's (whatever those processes do).
http://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_have_so_many_processes.2C_even_when_idle.3F
As for what they're doing: see pg_stat_activity.
--
Craig Ringer
Tech-related writing at http://soapyfrogs.blogspot.com/
Alexander Farber skrev 2010-10-04 11.48:
I wish I could see those hanging queries, what SQL do they try to execute:
To see the query you need to enable stats_command_string in your
postgresql.conf. See:
http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html
You might also benefit from chapter 24:
http://www.postgresql.org/docs/8.1/interactive/monitoring.html
Regards,
roppert
Show quoted text
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref |<command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
.........I have usually just one Postgres connection from my game
(a perl script running as daemon). And then I have several
connections from phpBB. But when I restart httpd,
the spinning postmaster's don't disappear:top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25
Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie
Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1199036k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster
3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster
3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster
3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster
3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster
3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster
3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster
3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster
Thank you, I've enabled logging,
but couldn't see anything suspicious there.
Also the 60%-120% CPU-eating postmaster
processes are now suddenly gone.
I suspect this is some bug in the
postgresql-server-8.1.21-1.el5_5.1 delivered
with CenOS 5.5/64 bit, because I don't
have many simultaneous users ( < 100)
and not that many queries.
I'll try to upgrade to 8.4.4 as described at
http://www.pgrpms.org/ later. I'm a bit confused
though that they do not support "yum update" anymore
(I think that worked earlier with PGDG....)
Regards
Alex
I've upgraded my CentOS 5.5 VM to:
postgresql-libs-8.4.4-1PGDG.rhel5
postgresql-devel-8.4.4-1PGDG.rhel5
postgresql-8.4.4-1PGDG.rhel5
postgresql-contrib-8.4.4-1PGDG.rhel5
postgresql-docs-8.4.4-1PGDG.rhel5
postgresql-server-8.4.4-1PGDG.rhel5
postgresql-plperl-8.4.4-1PGDG.rhel5
and then have loaded my backup and
then run my cleanup script (source below)
and it would stuck:
pref=> \i bin/clean-phpbb-forum.sql
START TRANSACTION
DELETE 6
CREATE TABLE
CREATE TABLE
INSERT 0 26699
INSERT 0 129903
[... and here nothing happens for minutes....]
And in the
/var/lib/pgsql/data/pg_log/postgresql-Mon.log
see (probably because of the many inserts?)
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
LOG: statement: start transaction;
LOG: statement: delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and
user_lastvisit=0 and age(to_timestamp(user_regdate))>interval '3
days';
LOG: statement: create temp table old_topics (topic_id integer) on
commit delete rows;
LOG: statement: create temp table old_posts (post_id integer) on
commit delete rows;
LOG: statement: insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
LOG: statement: insert into old_posts select p.post_id from
phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
LOG: statement: delete from phpbb_posts where post_id in (select
post_id from old_posts);
LOG: checkpoints are occurring too frequently (29 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (29 seconds apart)
Then after some minutes waiting,
I've pressed ctrl-Z on the pgsql and then "fg"
and this has waken PostgreSQL up for some reason:
[1]: + Stopped psql afarber@centos:~> fg psql DELETE 129903 DELETE 26699 UPDATE 1 UPDATE 1 UPDATE 1 UPDATE 1 COMMIT
afarber@centos:~> fg
psql
DELETE 129903
DELETE 26699
UPDATE 1
UPDATE 1
UPDATE 1
UPDATE 1
COMMIT
Could it be that my script is somehow making
PostgreSQL hang? (or wait for some input?):
cat ~/bin/clean-phpbb-forum.sql
start transaction;
delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';
create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;
insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
-- select count(topic_id) as "old topics:" from old_topics;
insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
-- select count(post_id) as "old posts:" from old_posts;
delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);
update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';
update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';
update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;
update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;
commit
Thank you for any insights
Alex
PS: I haven't modified postgresql.conf
or other conf files besides logging:
# diff postgresql.conf{.OLD,}
353c353
< #log_statement = 'none' # none, ddl, mod, all
---
log_statement = 'all' # none, ddl, mod, all
# grep checkpoint postgresql.conf
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
#log_checkpoints = off
Specifically look into setting.....
log_min_duration_statement (integer)
in the postgresql.conf file.
-----Original Message-----
From: Robert Gravsjö [mailto:tekniksupport@blogg.se]
Sent: Monday, October 04, 2010 7:22 AM
To: Alexander Farber
Cc: pgsql-general@postgresql.org
Subject: Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)
Alexander Farber skrev 2010-10-04 11.48:
I wish I could see those hanging queries, what SQL do they try to execute:
To see the query you need to enable stats_command_string in your
postgresql.conf. See:
http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html
You might also benefit from chapter 24:
http://www.postgresql.org/docs/8.1/interactive/monitoring.html
Regards,
roppert
Show quoted text
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref |<command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
.........I have usually just one Postgres connection from my game
(a perl script running as daemon). And then I have several
connections from phpBB. But when I restart httpd,
the spinning postmaster's don't disappear:top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25
Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie
Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1199036k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster
3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster
3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster
3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster
3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster
3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster
3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster
3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster