Database server restarting

Started by shoaibalmost 23 years ago24 messagesgeneral
Jump to latest
#1shoaib
shoaibm@vmoksha.com

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it a
preventive reboot once a day.some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

#2Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: shoaib (#1)
Re: Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it a
preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the problems it
is causing. It is possible that the postmaster is not being shutdown properly
because, for example, there is a client still connected and the shutdown script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in there,
aside from the 'EOF on client connection', just the normal messages on start up
from a bad shutdown. If you're worried, I would look at solving whatever the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

--
Nigel J. Andrews

#3shoaib
shoaibm@vmoksha.com
In reply to: Nigel J. Andrews (#2)
Re: Database server restarting

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it

a

preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

--
Nigel J. Andrews

#4Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: shoaib (#3)
Re: Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.

Hmmm...I assumed the 4:17 was from the scheduled reboot. It's a more difficult
issue if that was from the postmaster exiting by itself. Did the data loading
process end normally? It's a good few minutes but in the scheme of things 4
minutes for the postmaster to be restarted automatically may be isn't such a
long time.

I'm still drawn to this daily reboot process though. You do it to clean up the
system buffers. Why? Is there perhaps some instability in the system if the
system uses lots of memory? What is the hardware/os? Have you run hardware
diagnostics? If it's Intel/PC like there is a program called memtest86 which is
good at checking the memory. Be warned though, if you need that 24 hour up time
to run memtest86 properly you're going to lose a good few hours.

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it

a

preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

--
Nigel J. Andrews

#5Dennis Gearon
gearond@cvc.net
In reply to: shoaib (#3)
Re: Database server restarting

Modern OS's shouldn' need rebooting, unless something else is wrong. What's the quality of your hardware? Any applications compiled on bad hardware?

sigh, is it a windows environment?

shoaib wrote:

Show quoted text

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it

a

preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

#6shoaib
shoaibm@vmoksha.com
In reply to: Dennis Gearon (#5)
Re: Database server restarting

Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.

Please help.

regards
Shoaib

-----Original Message-----
From: Dennis Gearon [mailto:gearond@cvc.net]
Sent: Monday, May 05, 2003 11:13 PM
To: shoaib
Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

Modern OS's shouldn' need rebooting, unless something else is wrong.
What's the quality of your hardware? Any applications compiled on bad
hardware?

sigh, is it a windows environment?

shoaib wrote:

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you

provide

any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating

it

to database ended at 4.13 AM on the same day.
Your help is really appreciated.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it

a

preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the

shutdown

script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on

the

switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving

whatever

Show quoted text

the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

#7Martijn van Oosterhout
kleptog@svana.org
In reply to: shoaib (#6)
Re: Database server restarting

On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:

Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.

When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?

If so, it's got nothing to do with postgres as a user program simply can't
hang the machine like that (unless you run out of memory in which case it's
just really slow rather hung).

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.

When oyu say reboot, are you doing to proper shutdown sequence (shutdown -r
now) or are you just pulling the plug.

Please explain what "hangs". Also, rebooting everyday seems to be a massive
waste of time. UNIX machines don't need that kind of maintainence.

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#8shoaib
shoaibm@vmoksha.com
In reply to: Martijn van Oosterhout (#7)
Re: Database server restarting

When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.

Regards

Shoaib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:15 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:

Our server reboots at 1 aM in the morning and the job I mentioned

starts

at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it

be

the cause of this problem. After this thing happened my server just
hangs.

When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?

If so, it's got nothing to do with postgres as a user program simply
can't
hang the machine like that (unless you run out of memory in which case
it's
just really slow rather hung).

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it

and

Red Hat linux 7.3.
We are about to install 70 such servers.

When oyu say reboot, are you doing to proper shutdown sequence (shutdown
-r
now) or are you just pulling the plug.

Please explain what "hangs". Also, rebooting everyday seems to be a
massive
waste of time. UNIX machines don't need that kind of maintainence.

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

"the West won the world not by the superiority of its ideas or values

or

Show quoted text

religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#9Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: shoaib (#6)
Re: Database server restarting

On Tuesday 06 May 2003 11:11, shoaib wrote:

Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.

I am sure there is something not very correct here. You should not need a
server restart. I would like to see your database configuration options,
patterns in data access and min/max/avg load on each server.

10K records isn't much. Certainly not for that kind of hardware..

I am still bothered by the fact that you reboot your server daily. Can't find
a good reason from above description..

Shridhar

#10Martijn van Oosterhout
kleptog@svana.org
In reply to: shoaib (#8)
Re: Database server restarting

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:

When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like that. You
should look elsewhere for the exact cause. I'm assuming here that consoles
that are still logged in don't respond either? Maybe leave a top running to
capture the list of processes just before it dies? Any cronjobs about the
time it dies?

What other processes run at about that time?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

"the West won the world not by the superiority of its ideas or values or
religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#11shoaib
shoaibm@vmoksha.com
In reply to: Martijn van Oosterhout (#10)
Re: Database server restarting

There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.

Regards
shaoib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:

When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?

What other processes run at about that time?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

"the West won the world not by the superiority of its ideas or values

or

Show quoted text

religion but rather by its superiority in applying organized violence.
Westerners often forget this fact, non-Westerners never do."
- Samuel P. Huntington

#12Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: shoaib (#11)
Re: Database server restarting

On Tuesday 06 May 2003 12:16, shoaib wrote:

There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.

How much time did you wait? If server has doing heavy disk processing, it
would take upto 10 minutes under worst conditions.. Just don't give up in a
minute or so..

Shridhar

#13Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: shoaib (#11)
Re: Database server restarting

On Tue, 6 May 2003, shoaib wrote:

There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.

Do you mean you have no log in priviledges on to the machine or you are only
trying to login once you see a problem? If the former then I can't see how
there's any way you can make progress with this. If the later, forget that,
that's not helping since you are unable to get the processes running. What you
should do is log in _now_, run 'top' and leave it running. It may be that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop and
when you come into the office n hours later you find it merrily ticking away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.

Also take a look at procinfo, it may be helpful as well.

One thing that might be a problem is the number of open file descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.

I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your particular
hardware, such as the RAID device?

One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware problem
such as the RAID or you have a bug in your own software. Perhaps an endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu usage
process shouldn't bring the machine down but it can make it look very
unresponsive.

Regards
shaoib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:

When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?

What other processes run at about that time?

--
Nigel J. Andrews

#14shoaib
shoaibm@vmoksha.com
In reply to: Nigel J. Andrews (#13)
Re: Database server restarting

When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 3:44 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, 6 May 2003, shoaib wrote:

There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH

and

probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.

Do you mean you have no log in priviledges on to the machine or you are
only
trying to login once you see a problem? If the former then I can't see
how
there's any way you can make progress with this. If the later, forget
that,
that's not helping since you are unable to get the processes running.
What you
should do is log in _now_, run 'top' and leave it running. It may be
that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop
and
when you come into the office n hours later you find it merrily ticking
away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.

Also take a look at procinfo, it may be helpful as well.

One thing that might be a problem is the number of open file
descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.

I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your
particular
hardware, such as the RAID device?

One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware
problem
such as the RAID or you have a bug in your own software. Perhaps an
endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu
usage
process shouldn't bring the machine down but it can make it look very
unresponsive.

Regards
shaoib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:

When I say hangs it means ..I am not even able to login at the

server

console also.
No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like

that.

You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top

running

to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?

What other processes run at about that time?

--
Nigel J. Andrews

#15Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: shoaib (#14)
Re: Database server restarting

On Tue, 6 May 2003, shoaib wrote:

When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.

You may have to wait a long time which isn't very good because a) by the time
the system has enough resources to proceed with your log in it's not in the
same state it was in at the problem time (obviously) and b) the login process
may well timeout the login attempt before it even gets to the stage of asking
for the password.

You really do need to be logged in before the problem occurs. Indeed, have more
than one session running, run system monitoring utilities like top and procinfo
and also one you can type into without stopping those utilities.

If you can get the system to again you may also find it useful to run your
cronjobs by hand to verify them individually and to then try and replicate the
early morning conditions at whatever time you can test things. If you're having
to wait overnight everytime just to take a look at a new piece of the puzzle
you're locked into that timetable for generating and testing a solution.

--
Nigel Andrews

#16shoaib
shoaibm@vmoksha.com
In reply to: Nigel J. Andrews (#15)
Re: Database server restarting

Hello,

Thanks for you kind help.

But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Is there any particular reason for this thing.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 4:55 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: RE: [GENERAL] Database server restarting

On Tue, 6 May 2003, shoaib wrote:

When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.

You may have to wait a long time which isn't very good because a) by the
time
the system has enough resources to proceed with your log in it's not in
the
same state it was in at the problem time (obviously) and b) the login
process
may well timeout the login attempt before it even gets to the stage of
asking
for the password.

You really do need to be logged in before the problem occurs. Indeed,
have more
than one session running, run system monitoring utilities like top and
procinfo
and also one you can type into without stopping those utilities.

If you can get the system to again you may also find it useful to run
your
cronjobs by hand to verify them individually and to then try and
replicate the
early morning conditions at whatever time you can test things. If you're
having
to wait overnight everytime just to take a look at a new piece of the
puzzle
you're locked into that timetable for generating and testing a solution.

--
Nigel Andrews

#17Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: shoaib (#16)
Re: Database server restarting

On Tue, 6 May 2003, shoaib wrote:

Hello,

Thanks for you kind help.

But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Is there any particular reason for this thing.

Well, there are probably lots of potential causes but consider something like
this:

process A starts up
process A uses N MB of memory
process A loops
process A uses N+1 MB of memory
...
process B starts up and connects to DB
memory available is 1MB
process A loops
process A uses N+1 MB of memory
proi
process B wants 10KB more memory
process B dies for want of memory allocation checks
DB notes the unexpected EOF on the connection from B
process A loops
process A wants N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
...
system can't start any other process for lack of memory resources

You've got high system load, inability for processes to claim more memory and
errors about programs exiting at unexpected times.

--
Nigel Andrews

#18Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Nigel J. Andrews (#15)
Re: Database server restarting

On Tuesday 06 May 2003 14:25, Nigel J. Andrews wrote:

On Tue, 6 May 2003, shoaib wrote:

When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.

You may have to wait a long time which isn't very good because a) by the
time the system has enough resources to proceed with your log in it's not
in the same state it was in at the problem time (obviously) and b) the
login process may well timeout the login attempt before it even gets to the
stage of asking for the password.

I have two suggestions for OP, if he is interested in experimenting with
alternatives, assuming problems is with heavy DB process.

1) Try freeBSD4.8 and postgresql from ports. I have a gut feeling that BSD
would be more responsive under heavy disk load than linux. No concrete
evidence.. just a gut feeling..

2) Try a latest kernel.. I suggest you get 2.4.20 from kernel.org and apply
patches from http://members.optusnet.com.au/ckolivas/kernel/. Just get the
base patch that includes O(1), pre-empt and low-latency.. That should be good
enough..

Basically with either of these, the irresponsiveness that you are facing
should be gone and you should be able to debug the problem..

HTH

Shridhar

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nigel J. Andrews (#17)
Re: Database server restarting

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:

But is there any particular reason for database to do such kind of
behavior.
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0

You've got high system load, inability for processes to claim more memory and
errors about programs exiting at unexpected times.

What strikes me about the above trace is that we see "database system
was interrupted" without any prior failure. That says to me that
something killed the postmaster itself --- if a database child process
died, the postmaster would have logged the fact.

That leaves me with two questions: what killed the postmaster, and what
restarted it?

If Nigel's guess is right that the system is under heavy memory
pressure, and this is a Linux box, then the kernel itself might have
kill -9'd the postmaster to try to get out of a memory shortage.
I can't think of very many other theories (though I do recall at
least one self-inflicted problem, from someone whose "maintenance
script" kill -9'd the postmaster for random reasons...)

I'd also like to know whether the system is configured to auto-restart
the postmaster, and if so how, and does it do any mucking about (like
removing lockfiles) while it's doing so?

regards, tom lane

#20Dennis Gearon
gearond@cvc.net
In reply to: shoaib (#6)
Re: Database server restarting

Look in the archives about disk and memory testing.

memtest86 and some other program.

shoaib wrote:

Show quoted text

Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem. After this thing happened my server just
hangs.

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.

Please help.

regards
Shoaib

-----Original Message-----
From: Dennis Gearon [mailto:gearond@cvc.net]
Sent: Monday, May 05, 2003 11:13 PM
To: shoaib
Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

Modern OS's shouldn' need rebooting, unless something else is wrong.
What's the quality of your hardware? Any applications compiled on bad
hardware?

sigh, is it a windows environment?

shoaib wrote:

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you

provide

any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating

it

to database ended at 4.13 AM on the same day.
Your help is really appreciated.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it

a

preventive reboot once a day.

Odd concept. What is this reboot preventing?

some time I am getting this error and after
it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery
in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the

shutdown

script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on

the

switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving

whatever

the
answer to the daily reboot question shows is the problem.

DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#21scott.marlowe
scott.marlowe@ihs.com
In reply to: shoaib (#11)
#22scott.marlowe
scott.marlowe@ihs.com
In reply to: shoaib (#14)
#23Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: scott.marlowe (#21)
#24shoaib
shoaibm@vmoksha.com
In reply to: Tom Lane (#19)