Are indexes blown?
Hi,
I have an index on the user_id field in the query below:
myuser=# delete from clients where user_id like '64.22.91.%';
DELETE 22
Time: 220324.975 ms
Is there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.
How can I debug this? How can I check if the index is bloated or
blown? From the VACUUM ANALYZE output, nothing like this is apparent.
Thanks.
On Fri, Feb 15, 2008 at 3:55 PM, Phoenix Kiula <phoenix.kiula@gmail.com>
wrote:
Hi,
I have an index on the user_id field in the query below:
myuser=# delete from clients where user_id like '64.22.91.%';
DELETE 22
Time: 220324.975 msIs there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.
Use the *'pgstattuple'* contrib module -->
http://www.postgresql.org/docs/current/static/pgstattuple.html
*pgstatindex* function from the contrib module should be able to help you
there.
--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Phoenix Kiula wrote:
Hi,
I have an index on the user_id field in the query below:
myuser=# delete from clients where user_id like '64.22.91.%';
DELETE 22
Time: 220324.975 msIs there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.
First guess is that it's not using the index. What does
EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
show?
Check the list archives for locale and like and text_pattern_ops too -
that's a good place to check.
--
Richard Huxton
Archonet Ltd
On 15/02/2008, Shoaib Mir <shoaibmir@gmail.com> wrote:
On Fri, Feb 15, 2008 at 3:55 PM, Phoenix Kiula <phoenix.kiula@gmail.com>
wrote:Hi,
I have an index on the user_id field in the query below:
myuser=# delete from clients where user_id like '64.22.91.%';
DELETE 22
Time: 220324.975 msIs there any reason why it's taking 220 seconds to run this simple
query? There are about 3 million rows in this table.Use the 'pgstattuple' contrib module -->
http://www.postgresql.org/docs/current/static/pgstattuple.htmlpgstatindex function from the contrib module should be able to help you
there.
How should I install a "contrib" without bringing down my database, or
stopping it, or doing ANYTHING to it? It's in production. I can't
touch it. Will it be installed on the side and then I simply start
using it?
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
First guess is that it's not using the index. What does
EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
show?Check the list archives for locale and like and text_pattern_ops too -
that's a good place to check.
There is nothing to do with locale. The same database has been working
just fine for 2 years. Why should this be an issue now?
When I ran the EXPLAIN SELECT, the database was hanging. Or taking too
much time (waiting for 5 minutes), or whatever. I cancelled it.
That's the problem. It works, then it doesn't. Then it works again. I
am guessing it could be the load, but there's nothing new in terms of
load that should be causing this!
Phoenix Kiula wrote:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
First guess is that it's not using the index. What does
EXPLAIN ANALYSE SELECT user_id FROM clients WHERE user_id LIKE '...'
show?Check the list archives for locale and like and text_pattern_ops too -
that's a good place to check.There is nothing to do with locale. The same database has been working
just fine for 2 years. Why should this be an issue now?
No reason, but you hadn't said this was a change in behaviour, just that
it seemed slow.
When I ran the EXPLAIN SELECT, the database was hanging. Or taking too
much time (waiting for 5 minutes), or whatever. I cancelled it.That's the problem. It works, then it doesn't. Then it works again. I
am guessing it could be the load, but there's nothing new in terms of
load that should be causing this!
Ah, more new information! This does seem to point to the load,
particularly if it's exactly the same query each time. So what do
top/vmstat etc show for these "go-slow" periods?
--
Richard Huxton
Archonet Ltd
On Fri, Feb 15, 2008 at 5:18 PM, Phoenix Kiula <phoenix.kiula@gmail.com>
wrote:
How should I install a "contrib" without bringing down my database, or
stopping it, or doing ANYTHING to it? It's in production. I can't
touch it. Will it be installed on the side and then I simply start
using it?
You do not need to restart the database server for that purpose as all you
need is the pgstattuple.so file copied to <PG-HOME>/lib folder.
Do the following (in case you have installed server from source):
- Go to the <PostgreSQL-source>/contrib/pgstattuple folder
- run make and make install (this will copy pgstattuple.so file to the lib
folder of your PostgreSQL installation)
- Now from psql execute the pgstattuple.sql file for that specific database
which can be found in <PG-HOME>/share/contrib folder
- Once the sql file is executed now you can use the pgstattuple function
--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Actually my host has just told me that I have a number of "hung
semaphores" in my server. And he is relating them to postgresql. I am
not surprised, because this is the only utility that has issues. All
the rest is working (apache, mysql, exim, etc). Any thoughts on where
I should start looking for hung semaphores?
On 15/02/2008, Shoaib Mir <shoaibmir@gmail.com> wrote:
You do not need to restart the database server for that purpose as all you
need is the pgstattuple.so file copied to <PG-HOME>/lib folder.Do the following (in case you have installed server from source):
- Go to the <PostgreSQL-source>/contrib/pgstattuple folder
- run make and make install (this will copy pgstattuple.so file to the lib
folder of your PostgreSQL installation)
- Now from psql execute the pgstattuple.sql file for that specific database
which can be found in <PG-HOME>/share/contrib folder
- Once the sql file is executed now you can use the pgstattuple function
Thanks. But I had installed from rpm. Can I just download that .so
file and put in the lib folder for pgsql and then start using it?
On Fri, Feb 15, 2008 at 6:13 PM, Phoenix Kiula <phoenix.kiula@gmail.com>
wrote:
Thanks. But I had installed from rpm. Can I just download that .so
file and put in the lib folder for pgsql and then start using it?
Well I would say download the source for the same version you have, copy it
to your desktop machine, build it and then build the .so file for contrib
module using 'make' and 'make install'.... once that is done copy the .so
from lib folder of PG to your production PG box's lib folder.
--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Phoenix Kiula wrote:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Ah, more new information! This does seem to point to the load,
particularly if it's exactly the same query each time. So what do
top/vmstat etc show for these "go-slow" periods?In included top and vmstat info in my other post yesterday, but here
it is again:
Ah, you had a post yesterday!
(goes away, searches for previous post)
http://archives.postgresql.org/pgsql-general/2008-02/msg00689.php
PG quitting sporadically!!
Right, OK. Firstly, stop worrying about index usage and/or bloat. You
have unexplained process crashes to deal with first. There's no point in
looking at indexes until you figure out what is killing your processes.
Secondly, a single line from vmstat isn't useful, you want to compare
what is happening when things are fine with when they aren't. Leave
vmstat 10 logging to a file so you can catch it.
Thirdly, have you upgraded to the latest 8.2 (8.2.6) yet?
I see you've reduced work_mem, that's good.
Oh, you might as well lower max_connections from 150 too, there's no way
you can support that many concurrent queries anyway.
The fact that you're seeing various strange socket-related problems is
odd. As is the fact that logging doesn't seem to work for you.
Are you sure the two sets of vmstat/top figures are from when PG was
crashing/running queries slow? Everything seems idle to me in those figures.
--
Richard Huxton
Archonet Ltd
Import Notes
Reply to msg id not found: e373d31e0802150512x21c967a3wdbe357874311ede0@mail.gmail.com
In article <bf54be870802150517q2dce6219kd5633ffb99e49d8b@mail.gmail.com>,
"Shoaib Mir" <shoaibmir@gmail.com> writes:
On Fri, Feb 15, 2008 at 6:13 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
Thanks. But I had installed from rpm. Can I just download that .so
file and put in the lib folder for pgsql and then start using it?
Well I would say download the source for the same version you have, copy it to
your desktop machine, build it and then build the .so file for contrib module
using 'make' and 'make install'.... once that is done copy the .so from lib
folder of PG to your production PG box's lib folder.
But you have to ensure that you build PostgreSQL on your desktop
machine in exactly the same way as the RPM got built
(integer_datetimes etc).
On 2/15/08, Harald Fuchs <hari.fuchs@googlemail.com> wrote:
But you have to ensure that you build PostgreSQL on your desktop
machine in exactly the same way as the RPM got built
(integer_datetimes etc).
It'd probably be much easier to just install the -contrib RPM. :)
--
-Doug
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Phoenix Kiula wrote:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Ah, more new information! This does seem to point to the load,
particularly if it's exactly the same query each time. So what do
top/vmstat etc show for these "go-slow" periods?In included top and vmstat info in my other post yesterday, but here
it is again:Ah, you had a post yesterday!
(goes away, searches for previous post)
http://archives.postgresql.org/pgsql-general/2008-02/msg00689.php
PG quitting sporadically!!Right, OK. Firstly, stop worrying about index usage and/or bloat. You
have unexplained process crashes to deal with first. There's no point in
looking at indexes until you figure out what is killing your processes.Secondly, a single line from vmstat isn't useful, you want to compare
what is happening when things are fine with when they aren't. Leave
vmstat 10 logging to a file so you can catch it.Thirdly, have you upgraded to the latest 8.2 (8.2.6) yet?
I see you've reduced work_mem, that's good.
Oh, you might as well lower max_connections from 150 too, there's no way
you can support that many concurrent queries anyway.The fact that you're seeing various strange socket-related problems is
odd. As is the fact that logging doesn't seem to work for you.Are you sure the two sets of vmstat/top figures are from when PG was
crashing/running queries slow? Everything seems idle to me in those figures.
No. They are the vmstat figures from when I was replying to your
email. What will vmstat tell me and how should I set it up to do
"vmstat 10 logging"?
Btw, postgresql logging is working. But here're the kind of things I
have in there:
LOG: test message did not get through on socket for statistics collector
LOG: disabling statistics collector for lack of working socket
LOG: database system was shut down at 2008-02-15 06:12:10 CST
LOG: checkpoint record is at 8/E785304C
LOG: redo record is at 8/E785304C; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 0/296892698; next OID: 97929
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system is ready
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection
Now I don't know what is wrong or even where I should look. Postgresql
is often taking quite a bit of memory and CPU resources.
I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
old values were working just fine until recently!)
The biggest problem: when I su into postgres user and do a psql to get
into the PG console in my SSH, it takes a whole lot of time to come
up! It used to come up in a jiffy earlier!!! It now shows me this
error:
~ >
psql: could not connect to server: Connection timed out
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
Then, five minutes later, I can connect again! In less than a second!
What gives?
Finally, very simple queries like this one:
select url, disable_in_statistics, id, user_known from links where
alias = '1yqw7' and status = 'Y' limit 1
Which used to be server in "5 ms" (0.005 seconds) are now taking
upwards of 200 seconds! Your suggestion to "Explain Analyze" --
=# explain analyze select url, disable_in_statistics, id, user_known
from links where alias = '1yqw7' and status = 'Y' limit 1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..8.74 rows=1 width=113) (actual time=9.639..9.643
rows=1 loops=1)
-> Index Scan using links2_alias_key on links (cost=0.00..8.74
rows=1 width=113) (actual time=9.630..9.630 rows=1 loops=1)
Index Cond: ((alias)::text = '1yqw7'::text)
Filter: (status = 'Y'::bpchar)
Total runtime: 16.425 ms
(5 rows)
Now this is only when I have connected to the psql console, of course.
Still, these queries are intermittently very slow!
On 2/15/08, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection
This means your client processes are dying or getting killed (possibly
due to memory shortages?). Are these running on the same machine as
Postgres? Are there any logs you can look at to see what might be
going wrong? If this is Linux, are there any OOM-killer messages in
the syslogs?
-Doug
On Fri, Feb 15, 2008 at 8:36 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
No. They are the vmstat figures from when I was replying to your
email. What will vmstat tell me and how should I set it up to do
"vmstat 10 logging"?
Something like
vmstat 10 > vmstat.log
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connectionNow I don't know what is wrong or even where I should look. Postgresql
is often taking quite a bit of memory and CPU resources.I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
old values were working just fine until recently!)The biggest problem: when I su into postgres user and do a psql to get
into the PG console in my SSH, it takes a whole lot of time to come
up! It used to come up in a jiffy earlier!!! It now shows me this
error:
How many pgsql processes are there when this happens? Try something like
ps axu|grep postgres
to see. use
ps axu|grep postgres|wc -l
to get a rough count. I'm guessing that your web service layer is
keeping old connections open. could be something as ugly as php's
pg_pconnect or a buggy jdbc driver, etc...
Phoenix Kiula wrote:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Are you sure the two sets of vmstat/top figures are from when PG was
crashing/running queries slow? Everything seems idle to me in those figures.No. They are the vmstat figures from when I was replying to your
email. What will vmstat tell me and how should I set it up to do
"vmstat 10 logging"?
I'd write a small script and call it e.g. "trackusage.sh" and save it in
/tmp/
#!/bin/sh
while (/bin/true)
do
date >> /tmp/vmstat_figures.txt
vmstat 10 60 >> /tmp/vmstat_figures.txt
done
Then, set the execute flag on it and do something like:
nohup /tmp/trackusage.sh &
That should run even when you disconnect (don't forget to kill it once
this is fixed). It will log a timestamp every 10 minutes and vmstat
activity between.
[snip logging fragment]
Now I don't know what is wrong or even where I should look. Postgresql
is often taking quite a bit of memory and CPU resources.
Just checking - this is a real machine and not a virtual one, isn't it?
I've reduced work_mem to 10MB and Max_connections to 100. (Anyway, the
old values were working just fine until recently!)The biggest problem: when I su into postgres user and do a psql to get
into the PG console in my SSH, it takes a whole lot of time to come
up! It used to come up in a jiffy earlier!!! It now shows me this
error:~ >
psql: could not connect to server: Connection timed out
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?Then, five minutes later, I can connect again! In less than a second!
What gives?
Hopefully vmstat will show us.
Finally, very simple queries like this one:
select url, disable_in_statistics, id, user_known from links where
alias = '1yqw7' and status = 'Y' limit 1Which used to be server in "5 ms" (0.005 seconds) are now taking
upwards of 200 seconds!
Same symptom. I'd have guessed the machine is running out of memory and
swapping, but the vmstat/top stuff all look fine.
Your suggestion to "Explain Analyze" --
=# explain analyze select url, disable_in_statistics, id, user_known
from links where alias = '1yqw7' and status = 'Y' limit 1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..8.74 rows=1 width=113) (actual time=9.639..9.643
rows=1 loops=1)
-> Index Scan using links2_alias_key on links (cost=0.00..8.74
rows=1 width=113) (actual time=9.630..9.630 rows=1 loops=1)
Index Cond: ((alias)::text = '1yqw7'::text)
Filter: (status = 'Y'::bpchar)
Total runtime: 16.425 ms
Fine - it's nothing to do with the planner, indexes or anything else.
This is system-related, and vmstat should point us in the right direction.
--
Richard Huxton
Archonet Ltd
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Phoenix Kiula wrote:
On 15/02/2008, Richard Huxton <dev@archonet.com> wrote:
Are you sure the two sets of vmstat/top figures are from when PG was
crashing/running queries slow? Everything seems idle to me in those figures.No. They are the vmstat figures from when I was replying to your
email. What will vmstat tell me and how should I set it up to do
"vmstat 10 logging"?I'd write a small script and call it e.g. "trackusage.sh" and save it in
/tmp/#!/bin/sh
while (/bin/true)
do
date >> /tmp/vmstat_figures.txt
vmstat 10 60 >> /tmp/vmstat_figures.txt
doneThen, set the execute flag on it and do something like:
nohup /tmp/trackusage.sh &
Thanks Richard!
The script you suggested doesn't work:
tmp > ./trackusage.sh
-bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied
Anyway, I did the vmstat command. I was running it while the system
was ok, then not ok, then ok...and so on. So I hope these numbers have
captured what the issue is:
tmp > vmstat 10 60
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 3380 331140 114344 2992304 0 0 31 34 20 73 2 1 93 3
0 0 3380 414412 114352 2992296 0 0 0 100 1105 286 1 1 96 2
0 0 3380 430356 114380 2992268 0 0 0 133 1103 280 1 1 95 3
0 0 3380 418988 114392 2992256 0 0 2 120 1098 277 1 2 93 4
0 0 3380 347996 114408 2992240 0 0 0 85 1081 134 1 0 97 2
0 0 3380 293236 114444 2992204 0 0 0 40 1076 138 0 0 97 2
0 0 3380 252860 114456 2992192 0 0 0 80 1086 141 0 0 97 2
0 0 3380 141340 114480 2992168 0 0 2 54 1078 145 1 0 97 2
0 0 3380 119940 114504 2992144 0 0 0 31 1079 143 1 1 97 1
0 0 3380 104252 114524 2992124 0 0 0 64 1087 182 1 1 96 2
0 0 3380 90556 114560 2992088 0 0 0 59 1087 144 1 0 97 2
0 0 3380 132476 115088 2995460 0 0 52 174 1130 447 2 1 92 4
0 1 3380 280628 115124 2995684 0 0 31 220 1144 479 4 2 91 4
0 0 3380 361340 115152 2995656 0 0 0 147 1135 338 2 1 94 3
0 0 3380 382028 115180 2995628 0 0 2 113 1109 253 1 1 96 2
0 0 3380 369740 115220 2995588 0 0 3 200 1107 260 1 1 93 4
0 0 3380 323140 115248 2995560 0 0 0 60 1097 153 1 0 97 2
0 0 3380 280260 115272 2995536 0 0 0 77 1087 133 1 0 98 1
0 0 3380 200580 115296 2995512 0 0 2 65 1089 140 1 0 97 2
0 0 3380 81916 115392 2995676 0 0 17 82 1089 188 2 1 94 2
0 0 3380 16980 98072 2974256 0 0 48 122 1102 190 2 1 95 3
1 0 3380 21588 73160 2954708 0 0 86 274 1128 276 2 2 88 8
0 0 3380 52692 57860 2932048 0 0 1 128 1106 211 2 1 95 3
0 0 3380 184748 57960 2931948 0 0 6 219 1128 451 2 1 92 5
0 0 3380 342996 58016 2931892 0 0 0 140 1122 465 2 1 94 3
0 0 3380 452020 58068 2932100 0 0 1 122 1114 268 1 1 95 2
0 0 3380 478044 58132 2932036 0 0 0 106 1099 294 1 1 95 3
0 0 3380 447540 58224 2931944 0 0 1 238 1098 319 2 2 91 5
0 0 3380 392524 58284 2931884 0 0 0 71 1078 134 0 1 97 2
0 0 3380 299684 58340 2931828 0 0 1 88 1079 150 1 0 97 2
0 0 3380 231652 58388 2931780 0 0 0 40 1076 135 1 1 97 1
0 0 3380 139012 58432 2931736 0 0 0 42 1076 145 1 0 97 2
0 0 3380 117884 58472 2931696 0 0 1 67 1092 151 1 0 96 2
0 0 3380 129460 58528 2931640 0 0 0 59 1097 190 1 1 96 2
0 0 3380 179892 58584 2931584 0 0 0 42 1100 158 1 1 97 2
0 0 3380 272900 58648 2931520 0 0 0 111 1114 308 1 1 95 3
0 0 3380 399100 58704 2931724 0 0 0 132 1128 352 1 1 95 2
0 0 3380 484556 58748 2931680 0 0 0 76 1111 269 1 1 96 2
0 0 3380 501180 58804 2931884 0 0 0 93 1103 249 1 1 96 2
0 0 3380 492636 58864 2931824 0 0 0 138 1094 259 1 1 95 3
1 1 3380 428380 58912 2932036 0 0 0 44 1088 142 1 0 98 1
0 0 3380 362340 58996 2931952 0 0 1 45 1085 138 1 0 97 2
0 0 3380 292708 59072 2931876 0 0 0 71 1082 138 1 1 97 2
0 0 3380 179292 59172 2931776 0 0 0 65 1089 149 1 0 97 2
0 0 3380 127292 59236 2931712 0 0 0 38 1090 149 1 0 97 1
0 0 3380 101940 59304 2931904 0 0 22 72 1097 186 1 1 96 2
0 0 3380 134068 59340 2931868 0 0 0 74 1100 148 1 0 97 1
0 1 3380 257908 59400 2932068 0 0 0 112 1114 424 2 1 95 3
0 1 3380 399484 59460 2932008 0 0 0 96 1127 336 1 1 96 2
1 0 3380 480548 59524 2932464 0 0 1 100 1118 286 1 1 96 2
1 0 3380 497092 59580 2932408 0 0 0 120 1110 282 1 1 96 3
0 1 3380 481684 59652 2932336 0 0 0 174 1099 310 2 2 92 4
0 0 3380 416772 59692 2932296 0 0 0 49 1085 136 1 0 97 2
0 0 3380 372108 59740 2932248 0 0 1 68 1089 144 1 0 97 2
1 1 3380 307676 59808 2932180 0 0 1 75 1083 140 1 0 97 2
0 0 3380 232620 59904 2932084 0 0 4 78 1077 154 1 0 97 2
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 3380 169996 59948 2932300 0 0 0 41 1092 157 1 1 96 2
0 0 3380 171660 59996 2932252 0 0 0 135 1105 190 1 1 96 3
2 0 3380 166140 60052 2932456 0 0 0 47 1101 158 1 0 97 2
4 0 3380 269860 60104 2932404 0 0 0 103 1114 326 1 2 94 2
Please advise??
On Sat, 16 Feb 2008, Phoenix Kiula wrote:
The script you suggested doesn't work:
tmp > ./trackusage.sh
-bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied
Try changing the first line to
#!/bin/bash
Anyway, I did the vmstat command. I was running it while the system
was ok, then not ok, then ok...and so on. So I hope these numbers have
captured what the issue is:tmp > vmstat 10 60
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 3380 323140 115248 2995560 0 0 0 60 1097 153 1 0 97 2
0 0 3380 280260 115272 2995536 0 0 0 77 1087 133 1 0 98 1
0 0 3380 200580 115296 2995512 0 0 2 65 1089 140 1 0 97 2
0 0 3380 81916 115392 2995676 0 0 17 82 1089 188 2 1 94 2
0 0 3380 16980 98072 2974256 0 0 48 122 1102 190 2 1 95 3
1 0 3380 21588 73160 2954708 0 0 86 274 1128 276 2 2 88 8
0 0 3380 52692 57860 2932048 0 0 1 128 1106 211 2 1 95 3
0 0 3380 184748 57960 2931948 0 0 6 219 1128 451 2 1 92 5
0 0 3380 342996 58016 2931892 0 0 0 140 1122 465 2 1 94 3
Looks like the worst spot was in the middle here. Something gobbled up
over 300MB of memory in 40 seconds, enough to force the OS to blow away
almost half its disk buffers just to keep working memory free. Not so bad
that it went to swap or invoked the OOM killer but enough to push the I/O
block out (bo) up. I would guess the other ugly spots were the later
portions where the bo spiked >100.
But without knowing more about what the processing using this memory and
generating the output I/O are doing it's hard to say why. That's why I
suggested you watch top with the command lines turned on for a bit, to see
what process(es) are jumping around during the bad periods.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 16/02/2008, Greg Smith <gsmith@gregsmith.com> wrote:
On Sat, 16 Feb 2008, Phoenix Kiula wrote:
The script you suggested doesn't work:
tmp > ./trackusage.sh
-bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission deniedTry changing the first line to
#!/bin/bash
Thanks Greg. Same problem with that too. I guess my tmp folder is
secured and doesn't allow for executables? I put it in another folder
and it's working.
...snip....
Looks like the worst spot was in the middle here. Something gobbled up
over 300MB of memory in 40 seconds, enough to force the OS to blow away
almost half its disk buffers just to keep working memory free. Not so bad
that it went to swap or invoked the OOM killer but enough to push the I/O
block out (bo) up. I would guess the other ugly spots were the later
portions where the bo spiked >100.But without knowing more about what the processing using this memory and
generating the output I/O are doing it's hard to say why. That's why I
suggested you watch top with the command lines turned on for a bit, to see
what process(es) are jumping around during the bad periods.
Happy to do that, but top keeps changing before I can copy text from
it. I think most of the connections seem to be "httpd" which is Apache
2.2.6. I checked the netstat commands and the server is not under DDOS
or anything.
My hosting provider tells me that the Postgresql server is taking up a
lot of memory but I've been running the same db with the same config
for over 2 years. Yes we have been growing but what happened in the
last 3 days to warrant a sudden spike in memory consumption??!!
Anyway, I want to go back to them with some hard data that postgresql
is NOT the one that is causing my server to load. The indexes are all
in place (and I've REINDEXed my big tables anyway) so the performance
of pg itself is not an issue.
I just don't know where to get this hard data. The top output shows
httpd on top, and sometimes postmaster, but I don't know how to
repeatedly capture it. Any suggestions?