Re: postmaster dies (was Re: Very disappointing performance)
secret <secret@kearneydev.com> writes:
PostgreSQL is also crashing 1-2 times a day on me, although I have a
handy perl script to keep it alive now <grin>...
basically the server randomly dies with a:
ERROR: postmaster: StreamConnection: accept: Invalid argument
pmdie 3
(then signals all children to drop dead)
Hmm. That shouldn't happen, especially not randomly; if the accept
works the first time then it should work forever after, since the
arguments being passed in never change.
The error is coming from StreamConnection() in
pgsql/src/backend/libpq/pqcomm.c. Could you maybe add some debugging
code to the routine to see what the server_fd and port arguments are
when accept() fails? I think just changing the first elog() to
elog(ERROR,
"postmaster: StreamConnection: accept: %m\nserver_fd = %d, port = %p",
server_fd, port);
would do for starters. This would let us eliminate the possibility that
the routine is getting passed bad arguments.
An alternative possibility is to run the postmaster under truss so you
can see what arguments are passed to the kernel on every kernel call,
but that'd generate a pretty verbose logfile.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri12Mar1999135533-050036E96325.F99A51FA@kearneydev.com
Tom Lane writes...
secret <secret@kearneydev.com> writes:
PostgreSQL is also crashing 1-2 times a day on me, although I have a
handy perl script to keep it alive now <grin>...basically the server randomly dies with a:
ERROR: postmaster: StreamConnection: accept: Invalid argument
pmdie 3
(then signals all children to drop dead)Hmm. That shouldn't happen, especially not randomly; if the accept
works the first time then it should work forever after, since the
arguments being passed in never change.[snip]
An alternative possibility is to run the postmaster under truss so you
can see what arguments are passed to the kernel on every kernel call,
but that'd generate a pretty verbose logfile.
FWIW...
If your (secret's) system uses strace, you can tell it to filter
just specific calls or groups of calls. For example,
strace -f -s 256 -e trace=network -o /tmp/strace.log -p <postmaster pid>
Should trace all the network operations of the postmaster and
all the children. I'm not sure if the socket reads/writes will
be included or not. The -s sets the 'snapshot' length.
Truss probably has similar options that can be enabled in some
baroque manner.
-- cary
Import Notes
Resolved by subject fallback
Tom Lane wrote:
secret <secret@kearneydev.com> writes:
PostgreSQL is also crashing 1-2 times a day on me, although I have a
handy perl script to keep it alive now <grin>...basically the server randomly dies with a:
ERROR: postmaster: StreamConnection: accept: Invalid argument
pmdie 3
(then signals all children to drop dead)Hmm. That shouldn't happen, especially not randomly; if the accept
works the first time then it should work forever after, since the
arguments being passed in never change.The error is coming from StreamConnection() in
pgsql/src/backend/libpq/pqcomm.c. Could you maybe add some debugging
code to the routine to see what the server_fd and port arguments are
when accept() fails? I think just changing the first elog() toelog(ERROR,
"postmaster: StreamConnection: accept: %m\nserver_fd = %d, port = %p",
server_fd, port);would do for starters. This would let us eliminate the possibility that
the routine is getting passed bad arguments.An alternative possibility is to run the postmaster under truss so you
can see what arguments are passed to the kernel on every kernel call,
but that'd generate a pretty verbose logfile.regards, tom lane
Done. I'll install the new binaries at the end of the day when no one is
using the database and give you a copy of the logs when it dies again. Thank
you for the help on this, it's very much appreciated.
David Secret
MIS Director
Kearney Development Co., Inc.
Tom Lane wrote:
secret <secret@kearneydev.com> writes:
PostgreSQL is also crashing 1-2 times a day on me, although I have a
handy perl script to keep it alive now <grin>...basically the server randomly dies with a:
ERROR: postmaster: StreamConnection: accept: Invalid argument
pmdie 3
(then signals all children to drop dead)Hmm. That shouldn't happen, especially not randomly; if the accept
works the first time then it should work forever after, since the
arguments being passed in never change.The error is coming from StreamConnection() in
pgsql/src/backend/libpq/pqcomm.c. Could you maybe add some debugging
code to the routine to see what the server_fd and port arguments are
when accept() fails? I think just changing the first elog() toelog(ERROR,
"postmaster: StreamConnection: accept: %m\nserver_fd = %d, port = %p",
server_fd, port);would do for starters. This would let us eliminate the possibility that
the routine is getting passed bad arguments.An alternative possibility is to run the postmaster under truss so you
can see what arguments are passed to the kernel on every kernel call,
but that'd generate a pretty verbose logfile.regards, tom lane
query: SELECT "material_id" ,"name" ,"short_name" ,"legacy" FROM "material"
ORDE
R BY "legacy" DESC,"name"
ProcessQuery
! system usage stats:
! 0.017961 elapsed 0.020000 user 0.000000 system sec
! [0.050000 user 0.020000 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 6/24 [127/201] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 0 read, 0 written, buffer hit rate =
10
0.00%
! Local blocks: 0 read, 0 written, buffer hit rate =
0.
00%
! Direct blocks: 0 read, 0 written
CommitTransactionCommand
ERROR: postmaster: StreamConnection: accept: Invalid argument
server_fd = 3, port = 0x816aa70
pmdie 3
SignalChildren: sending signal 15 to process 16943
SignalChildren: sending signal 15 to process 16942
SignalChildren: sending signal 15 to process 16941
There we go, it crashed this morning...(interestingly it went all of
yesterday without crashing)... Does this shed some light? If not what would
you like me to do next? I have 700M+ to keep a log file, as long as it doesn't
generate that much in a day we should be okay with a very verbose log.
Just tell me what code mods or runtime options to use...
David Secret
MIS Director
Kearney Development Co., Inc.
secret <secret@kearneydev.com> writes:
ERROR: postmaster: StreamConnection: accept: Invalid argument
server_fd = 3, port = 0x816aa70
There we go, it crashed this morning...(interestingly it went all of
yesterday without crashing)... Does this shed some light?
Not much ... it shows pretty much what we expected, ie, nothing
obviously wrong.
What I would suggest doing next is running the postmaster under 'truss'
or some similar utility that can generate a logfile of all the kernel
calls made by the postmaster. I can't give you any details on how to do
that --- perhaps some other reader can help? What we're looking for is
anything that might have changed the state of file descriptor 3 shortly
before the crash.
BTW, some tips on debugging this. Maybe these are obvious, maybe not:
1. This accept call is not associated with normal query processing, but
with receiving connection requests from new clients. Almost certainly
the bug is not triggered by processing queries but by connection
attempts. You probably could make the crash happen sooner by starting
and stopping clients in a steady stream (not that you want a crash
sooner on your real system, of course, but for debugging it'd be nice
not to have to wait for long).
2. You might want to build a playpen system that you can stress into
crashing without taking out your live server. The easiest way to do
that is just to duplicate your installation on another machine, but if
no other machine is handy (or if you suspect a platform-dependent bug,
which I do here) the best bet is to build a debugging version of
Postgres that has nonstandard values for the installation directory
and server's port address. For example I usually build trial versions
with
./configure --with-pgport=5440 --prefix=/users/postgres/testversion
(plus any options you normally use, of course). I think it might also
be possible to set these values while running initdb and starting the
test postmaster, without having to recompile; but I don't know the
exact incantations to use to do it that way.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue16Mar1999090502-050036EE650D.4C0818D2@kearneydev.com | Resolved by subject fallback
Tom Lane wrote:
secret <secret@kearneydev.com> writes:
ERROR: postmaster: StreamConnection: accept: Invalid argument
server_fd = 3, port = 0x816aa70There we go, it crashed this morning...(interestingly it went all of
yesterday without crashing)... Does this shed some light?Not much ... it shows pretty much what we expected, ie, nothing
obviously wrong.What I would suggest doing next is running the postmaster under 'truss'
or some similar utility that can generate a logfile of all the kernel
calls made by the postmaster. I can't give you any details on how to do
that --- perhaps some other reader can help? What we're looking for is
anything that might have changed the state of file descriptor 3 shortly
before the crash.BTW, some tips on debugging this. Maybe these are obvious, maybe not:
1. This accept call is not associated with normal query processing, but
with receiving connection requests from new clients. Almost certainly
the bug is not triggered by processing queries but by connection
attempts. You probably could make the crash happen sooner by starting
and stopping clients in a steady stream (not that you want a crash
sooner on your real system, of course, but for debugging it'd be nice
not to have to wait for long).2. You might want to build a playpen system that you can stress into
crashing without taking out your live server. The easiest way to do
that is just to duplicate your installation on another machine, but if
no other machine is handy (or if you suspect a platform-dependent bug,
which I do here) the best bet is to build a debugging version of
Postgres that has nonstandard values for the installation directory
and server's port address. For example I usually build trial versions
with./configure --with-pgport=5440 --prefix=/users/postgres/testversion
(plus any options you normally use, of course). I think it might also
be possible to set these values while running initdb and starting the
test postmaster, without having to recompile; but I don't know the
exact incantations to use to do it that way.regards, tom lane
Would strace work instead of truss? I have strace... Will you be able to
interpret the strace files & determine the problem do you think?
You've been the only one to respond on this, so I'm a tad worried about
being left out in the cold on this one... I'd be glad to pay for support if
there is a place I can do that, heck I pay for support on other software
products, why not PostgreSQL?
Please let me know. I'll begin an strace tonight...
David
Would strace work instead of truss? I have strace... Will you be able to
interpret the strace files & determine the problem do you think?You've been the only one to respond on this, so I'm a tad worried about
being left out in the cold on this one... I'd be glad to pay for support if
there is a place I can do that, heck I pay for support on other software
products, why not PostgreSQL?Please let me know. I'll begin an strace tonight...
I can't imagine he has enough disk space for truss/ktrace output for a
full day of backend activity, does he?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Would strace work instead of truss? I have strace... Will you be able to
interpret the strace files & determine the problem do you think?You've been the only one to respond on this, so I'm a tad worried about
being left out in the cold on this one... I'd be glad to pay for support if
there is a place I can do that, heck I pay for support on other software
products, why not PostgreSQL?Please let me know. I'll begin an strace tonight...
I can't imagine he has enough disk space for truss/ktrace output for a
full day of backend activity, does he?-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Ur, I'll postpone this to Thursday, when I can monitor the disk space very
carefully, how much space are we talking about here? 1G? 2G? 3G? 10G?
Maybe I can temporarily install a hard disk just for that purpose.... There
are only a few users on the database, it really isn't *THAT* active.
--David
Ur, I'll postpone this to Thursday, when I can monitor the disk space very
carefully, how much space are we talking about here? 1G? 2G? 3G? 10G?Maybe I can temporarily install a hard disk just for that purpose.... There
are only a few users on the database, it really isn't *THAT* active.
Hard to say. I would turn it on for 15 minutes and see. ktrace can
generate a 1MB files in a minute.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
I can't imagine he has enough disk space for truss/ktrace output for a
full day of backend activity, does he?
That's why I was encouraging him to set up a playpen and actively
work at crashing it, rather than waiting around to see whether it'd
happen before his disk fills up ;-)
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue16Mar1999135026-0500199903161850.NAA13596@candle.pha.pa.us | Resolved by subject fallback
Tom Lane wrote:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
I can't imagine he has enough disk space for truss/ktrace output for a
full day of backend activity, does he?That's why I was encouraging him to set up a playpen and actively
work at crashing it, rather than waiting around to see whether it'd
happen before his disk fills up ;-)regards, tom lane
I've built a simple program to record the last N lines(currently
5000...Suggestions?) of input... What I'd like to do is pipe STDIN and
STDERR to this program, but "|" doesn't do this, do you all have a
suggestion on how to do this? If I can then I can get you the system trace
and hopefully get this crash bug fixed.
On Tue, 23 Mar 1999, secret wrote:
I've built a simple program to record the last N lines(currently
5000...Suggestions?) of input... What I'd like to do is pipe STDIN and
STDERR to this program, but "|" doesn't do this, do you all have a
suggestion on how to do this? If I can then I can get you the system trace
and hopefully get this crash bug fixed.
strace ... 2>&1 | tail -5000
Note that tail is a standard *nix program.
Taral