Database server restarting
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it a
preventive reboot once a day.some time I am getting this error and after
it the sytem hang .Can any body help in this.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection
Regards
Shoaib
On Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
It looks like your preventative daily reboot is not preventing the problems it
is causing. It is possible that the postmaster is not being shutdown properly
because, for example, there is a client still connected and the shutdown script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.
As for worrying about the messages, there's no real error message in there,
aside from the 'EOF on client connection', just the normal messages on start up
from a bad shutdown. If you're worried, I would look at solving whatever the
answer to the daily reboot question shows is the problem.
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
--
Nigel J. Andrews
Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.
Regards
Shoaib
-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting
On Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it
a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.
As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
--
Nigel J. Andrews
On Mon, 5 May 2003, shoaib wrote:
Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Hmmm...I assumed the 4:17 was from the scheduled reboot. It's a more difficult
issue if that was from the postmaster exiting by itself. Did the data loading
process end normally? It's a good few minutes but in the scheme of things 4
minutes for the postmaster to be restarted automatically may be isn't such a
long time.
I'm still drawn to this daily reboot process though. You do it to clean up the
system buffers. Why? Is there perhaps some instability in the system if the
system uses lots of memory? What is the hardware/os? Have you run hardware
diagnostics? If it's Intel/PC like there is a program called memtest86 which is
good at checking the memory. Be warned though, if you need that 24 hour up time
to run memtest86 properly you're going to lose a good few hours.
-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it
a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progressIt looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
--
Nigel J. Andrews
Modern OS's shouldn' need rebooting, unless something else is wrong. What's the quality of your hardware? Any applications compiled on bad hardware?
sigh, is it a windows environment?
shoaib wrote:
Show quoted text
Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.Regards
Shoaib-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it
a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progressIt looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.
Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.
Please help.
regards
Shoaib
-----Original Message-----
From: Dennis Gearon [mailto:gearond@cvc.net]
Sent: Monday, May 05, 2003 11:13 PM
To: shoaib
Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting
Modern OS's shouldn' need rebooting, unless something else is wrong.
What's the quality of your hardware? Any applications compiled on bad
hardware?
sigh, is it a windows environment?
shoaib wrote:
Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you
provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating
it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.Regards
Shoaib-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it
a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progressIt looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the
shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on
the
switches.
As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving
whatever
Show quoted text
the
answer to the daily reboot question shows is the problem.DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:
Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.
When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?
If so, it's got nothing to do with postgres as a user program simply can't
hang the machine like that (unless you run out of memory in which case it's
just really slow rather hung).
Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.
When oyu say reboot, are you doing to proper shutdown sequence (shutdown -r
now) or are you just pulling the plug.
Please explain what "hangs". Also, rebooting everyday seems to be a massive
waste of time. UNIX machines don't need that kind of maintainence.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington
When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.
Regards
Shoaib
-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:15 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting
On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:
Our server reboots at 1 aM in the morning and the job I mentioned
starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it
be
the cause of this problem. After this thing happened my server just
hangs.
When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?
If so, it's got nothing to do with postgres as a user program simply
can't
hang the machine like that (unless you run out of memory in which case
it's
just really slow rather hung).
Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it
and
Red Hat linux 7.3.
We are about to install 70 such servers.
When oyu say reboot, are you doing to proper shutdown sequence (shutdown
-r
now) or are you just pulling the plug.
Please explain what "hangs". Also, rebooting everyday seems to be a
massive
waste of time. UNIX machines don't need that kind of maintainence.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
"the West won the world not by the superiority of its ideas or values
or
Show quoted text
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington
On Tuesday 06 May 2003 11:11, shoaib wrote:
Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.
I am sure there is something not very correct here. You should not need a
server restart. I would like to see your database configuration options,
patterns in data access and min/max/avg load on each server.
10K records isn't much. Certainly not for that kind of hardware..
I am still bothered by the fact that you reboot your server daily. Can't find
a good reason from above description..
Shridhar
On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.
Well, that's not postgresql's fault. It can't hang a machine like that. You
should look elsewhere for the exact cause. I'm assuming here that consoles
that are still logged in don't respond either? Maybe leave a top running to
capture the list of processes just before it dies? Any cronjobs about the
time it dies?
What other processes run at about that time?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington
There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.
Regards
shaoib
-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting
On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.
Well, that's not postgresql's fault. It can't hang a machine like that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?
What other processes run at about that time?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
"the West won the world not by the superiority of its ideas or values
or
Show quoted text
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington
On Tuesday 06 May 2003 12:16, shoaib wrote:
There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.
How much time did you wait? If server has doing heavy disk processing, it
would take upto 10 minutes under worst conditions.. Just don't give up in a
minute or so..
Shridhar
On Tue, 6 May 2003, shoaib wrote:
There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.
Do you mean you have no log in priviledges on to the machine or you are only
trying to login once you see a problem? If the former then I can't see how
there's any way you can make progress with this. If the later, forget that,
that's not helping since you are unable to get the processes running. What you
should do is log in _now_, run 'top' and leave it running. It may be that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop and
when you come into the office n hours later you find it merrily ticking away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.
Also take a look at procinfo, it may be helpful as well.
One thing that might be a problem is the number of open file descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.
I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your particular
hardware, such as the RAID device?
One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware problem
such as the RAID or you have a bug in your own software. Perhaps an endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu usage
process shouldn't bring the machine down but it can make it look very
unresponsive.
Regards
shaoib-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.Well, that's not postgresql's fault. It can't hang a machine like that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?What other processes run at about that time?
--
Nigel J. Andrews
When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.
Regards
Shoaib
-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 3:44 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting
On Tue, 6 May 2003, shoaib wrote:
There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH
and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.
Do you mean you have no log in priviledges on to the machine or you are
only
trying to login once you see a problem? If the former then I can't see
how
there's any way you can make progress with this. If the later, forget
that,
that's not helping since you are unable to get the processes running.
What you
should do is log in _now_, run 'top' and leave it running. It may be
that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop
and
when you come into the office n hours later you find it merrily ticking
away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.
Also take a look at procinfo, it may be helpful as well.
One thing that might be a problem is the number of open file
descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.
I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your
particular
hardware, such as the RAID device?
One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware
problem
such as the RAID or you have a bug in your own software. Perhaps an
endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu
usage
process shouldn't bring the machine down but it can make it look very
unresponsive.
Regards
shaoib-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
When I say hangs it means ..I am not even able to login at the
server
console also.
No ssh, no login form remote machines.Well, that's not postgresql's fault. It can't hang a machine like
that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top
running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?What other processes run at about that time?
--
Nigel J. Andrews
On Tue, 6 May 2003, shoaib wrote:
When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.
You may have to wait a long time which isn't very good because a) by the time
the system has enough resources to proceed with your log in it's not in the
same state it was in at the problem time (obviously) and b) the login process
may well timeout the login attempt before it even gets to the stage of asking
for the password.
You really do need to be logged in before the problem occurs. Indeed, have more
than one session running, run system monitoring utilities like top and procinfo
and also one you can type into without stopping those utilities.
If you can get the system to again you may also find it useful to run your
cronjobs by hand to verify them individually and to then try and replicate the
early morning conditions at whatever time you can test things. If you're having
to wait overnight everytime just to take a look at a new piece of the puzzle
you're locked into that timetable for generating and testing a solution.
--
Nigel Andrews
Hello,
Thanks for you kind help.
But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection
Is there any particular reason for this thing.
Regards
Shoaib
-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 4:55 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: RE: [GENERAL] Database server restarting
On Tue, 6 May 2003, shoaib wrote:
When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.
You may have to wait a long time which isn't very good because a) by the
time
the system has enough resources to proceed with your log in it's not in
the
same state it was in at the problem time (obviously) and b) the login
process
may well timeout the login attempt before it even gets to the stage of
asking
for the password.
You really do need to be logged in before the problem occurs. Indeed,
have more
than one session running, run system monitoring utilities like top and
procinfo
and also one you can type into without stopping those utilities.
If you can get the system to again you may also find it useful to run
your
cronjobs by hand to verify them individually and to then try and
replicate the
early morning conditions at whatever time you can test things. If you're
having
to wait overnight everytime just to take a look at a new piece of the
puzzle
you're locked into that timetable for generating and testing a solution.
--
Nigel Andrews
On Tue, 6 May 2003, shoaib wrote:
Hello,
Thanks for you kind help.
But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionIs there any particular reason for this thing.
Well, there are probably lots of potential causes but consider something like
this:
process A starts up
process A uses N MB of memory
process A loops
process A uses N+1 MB of memory
...
process B starts up and connects to DB
memory available is 1MB
process A loops
process A uses N+1 MB of memory
proi
process B wants 10KB more memory
process B dies for want of memory allocation checks
DB notes the unexpected EOF on the connection from B
process A loops
process A wants N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
...
system can't start any other process for lack of memory resources
You've got high system load, inability for processes to claim more memory and
errors about programs exiting at unexpected times.
--
Nigel Andrews
On Tuesday 06 May 2003 14:25, Nigel J. Andrews wrote:
On Tue, 6 May 2003, shoaib wrote:
When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.You may have to wait a long time which isn't very good because a) by the
time the system has enough resources to proceed with your log in it's not
in the same state it was in at the problem time (obviously) and b) the
login process may well timeout the login attempt before it even gets to the
stage of asking for the password.
I have two suggestions for OP, if he is interested in experimenting with
alternatives, assuming problems is with heavy DB process.
1) Try freeBSD4.8 and postgresql from ports. I have a gut feeling that BSD
would be more responsive under heavy disk load than linux. No concrete
evidence.. just a gut feeling..
2) Try a latest kernel.. I suggest you get 2.4.20 from kernel.org and apply
patches from http://members.optusnet.com.au/ckolivas/kernel/. Just get the
base patch that includes O(1), pre-empt and low-latency.. That should be good
enough..
Basically with either of these, the irresponsiveness that you are facing
should be gone and you should be able to debug the problem..
HTH
Shridhar
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
You've got high system load, inability for processes to claim more memory and
errors about programs exiting at unexpected times.
What strikes me about the above trace is that we see "database system
was interrupted" without any prior failure. That says to me that
something killed the postmaster itself --- if a database child process
died, the postmaster would have logged the fact.
That leaves me with two questions: what killed the postmaster, and what
restarted it?
If Nigel's guess is right that the system is under heavy memory
pressure, and this is a Linux box, then the kernel itself might have
kill -9'd the postmaster to try to get out of a memory shortage.
I can't think of very many other theories (though I do recall at
least one self-inflicted problem, from someone whose "maintenance
script" kill -9'd the postmaster for random reasons...)
I'd also like to know whether the system is configured to auto-restart
the postmaster, and if so how, and does it do any mucking about (like
removing lockfiles) while it's doing so?
regards, tom lane
Look in the archives about disk and memory testing.
memtest86 and some other program.
shoaib wrote:
Show quoted text
Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.Please help.
regards
Shoaib-----Original Message-----
From: Dennis Gearon [mailto:gearond@cvc.net]
Sent: Monday, May 05, 2003 11:13 PM
To: shoaib
Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingModern OS's shouldn' need rebooting, unless something else is wrong.
What's the quality of your hardware? Any applications compiled on bad
hardware?sigh, is it a windows environment?
shoaib wrote:
Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can youprovide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updatingit
to database ended at 4.13 AM on the same day.
Your help is really appreciated.Regards
Shoaib-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restartingOn Mon, 5 May 2003, shoaib wrote:
Hello Everybody,
We are using postgressql 7.2.2 . our system running is 24 hours day it
a
preventive reboot once a day.
Odd concept. What is this reboot preventing?
some time I am getting this error and after
it the sytem hang .Can any body help in this.DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progressIt looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and theshutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation onthe
switches.
As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solvingwhatever
the
answer to the daily reboot question shows is the problem.DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connectionRegards
Shoaib
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster