It happened again: Server hung up solid
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.
errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432
and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
But, of course, no date/time ...
ps shows:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 33515 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres)
pgsql 33516 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1088 p0 S Wed03PM 0:01.11 -su (tcsh)
pgsql 7100 0.0 0.5 38692 2616 ?? Is Fri12AM 8:43.44 /pgsql/bin/postmas
pgsql 33667 0.0 0.0 396 224 p0 R+ 7:35PM 0:00.00 ps ux
and postmaster is started with:
pgsql% cat pgstart
#!/bin/tcsh
setenv PORT 5432
setenv POSTMASTER /pgsql/bin/postmaster
unlimit
${POSTMASTER} -B 4096 -N 128 -S -o "-F -o /pgsql/errout.${PORT} -S 32768" \
-i -p ${PORT} -D/pgsql/data
The machine is a Dual PIII with 512Meg of RAM, running FreeBSD 4.0-STABLE
from April 22nd ...
pgsql% truss -p 7100
Shows zilch ...
Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?
At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruption ... this time, I've checked a VACUUM after re-starting and it
doesn't appear to be a problem, but they might not have been related, just
a fluke ...
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.
Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.
errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432
and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
But, of course, no date/time ...
Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?
Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).
At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruption
If the postmaster is hanging then it's almost certainly unrelated to
index corruption...
regards, tom lane
On Sun, 7 May 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.
This appears to be the case, but next time it happens I will make
double-sure of that ... considering that it was ~7pm at night when I
tried, my initial guess is that nothing is going through postmaster at the
time of hte hang ...
Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?
alot of dbi stuff, the search engine for udmsearch, some php ... the
server is currently serving ~12 databases for various clients ...
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).
Will look at that one ...
Okay, just happened again ... no postgres backend is being started:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432
pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh)
pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch
pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux
Going to look at the connection tracing option now and see what I can come
up with ...
On Sun, 7 May 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peerBut, of course, no date/time ...
Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruptionIf the postmaster is hanging then it's almost certainly unrelated to
index corruption...regards, tom lane
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
kill -ABRT does nothing:
pgsql% kill -ABRT 33683
pgsql% !ps
ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas
pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh)
pgsql 34696 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux
pgsql% !ps
ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas
pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh)
pgsql 34697 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux
On Sun, 7 May 2000, The Hermit Hacker wrote:
Okay, just happened again ... no postgres backend is being started:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432
pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh)
pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch
pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps uxGoing to look at the connection tracing option now and see what I can come
up with ...On Sun, 7 May 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peerBut, of course, no date/time ...
Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruptionIf the postmaster is hanging then it's almost certainly unrelated to
index corruption...regards, tom lane
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, 7 May 2000, The Hermit Hacker wrote:
Okay, just happened again ... no postgres backend is being started:
I don't know how close in time it was, but I just hit reload on that
query that was sent to webmaster.
Vince.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh)
pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432
pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh)
pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch
pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps uxGoing to look at the connection tracing option now and see what I can come
up with ...On Sun, 7 May 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.Do you mean you can't make a connection at all? Is there any indication
that the postmaster is lighting off a backend for you? Since you show
a couple of zombie backends hanging around, it would seem like a good
bet that the postmaster itself is wedged and not responding to events,
but I'm not sure.errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peerBut, of course, no date/time ...
Given that the file mod time is considerably before the hang (right?)
the messages in it are probably unrelated. It does seem odd that you
have so many clients disconnecting ungracefully; what client apps are
you running?Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that. It might also
be worth running the postmaster with connection tracing turned on (I
forget the incantation for that, but it should be in TFM).At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruptionIf the postmaster is hanging then it's almost certainly unrelated to
index corruption...regards, tom lane
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================
With -d set to 1 (connection tracing), all I see when I connect, in the
log files, is:
FindExec: found "/pgsql/bin/postgres" using argv[0]
FindExec: found "/pgsql/bin/postgres" using argv[0]
doesn't tell me to what I'm connecting through ...
On Sun, 7 May 2000, The Hermit Hacker wrote:
Okay, this is with code of ~May 4th ... a 'psql' connection to the
database hangs solid.errout is dated:
pgsql% !ls
ls -lt
total 13324
-rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432and the last few lines contain:
ERROR: parser: parse error at or near "vpti"
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peer
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
pq_flush: send() failed: Broken pipe
pq_recvbuf: recv() failed: Connection reset by peerBut, of course, no date/time ...
ps shows:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 33515 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres)
pgsql 33516 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres)
pgsql 93757 0.0 0.2 1456 1088 p0 S Wed03PM 0:01.11 -su (tcsh)
pgsql 7100 0.0 0.5 38692 2616 ?? Is Fri12AM 8:43.44 /pgsql/bin/postmas
pgsql 33667 0.0 0.0 396 224 p0 R+ 7:35PM 0:00.00 ps uxand postmaster is started with:
pgsql% cat pgstart
#!/bin/tcsh
setenv PORT 5432
setenv POSTMASTER /pgsql/bin/postmaster
unlimit
${POSTMASTER} -B 4096 -N 128 -S -o "-F -o /pgsql/errout.${PORT} -S 32768" \
-i -p ${PORT} -D/pgsql/dataThe machine is a Dual PIII with 512Meg of RAM, running FreeBSD 4.0-STABLE
from April 22nd ...pgsql% truss -p 7100
Shows zilch ...
Since this is a production server, I can't just leave it there hung like
that, but if someone wants to give some instructions on what to do the
next time this happens, please feel free to do so, and I'll add that to my
list ... maybe run a gdb command on it, since truss doesn't appear to
help?At this time, I consider this to be a show-stopper on the release ... this
is what happened the last time when the result appeared to be the index
corruption ... this time, I've checked a VACUUM after re-starting and it
doesn't appear to be a problem, but they might not have been related, just
a fluke ...Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes:
kill -ABRT does nothing:
Oh? Must be hung up in a kernel call then. That will probably mean
that you can't attach to the stuck process with gdb either (though
it'd be worth trying, since a backtrace would be mighty useful if
you could get it).
My next thought is to truss the postmaster process before it hangs
up, with hopes of finding out what kernel call is hanging.
Also, you might try netstat to see if you can see any freshly-opened
incoming connections when it happens. Also, "lsof -p" or local
equivalent on the stuck postmaster.
regards, tom lane
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.
The "gcore" command (on most modern unices) will generate a core dump of a
running process without killing the process. It seems that would be more
useful in this circumstance.
-Michael Robinson
Import Notes
Resolved by subject fallback
*sigh*
gcore 87721
gcore: /proc/87721/file: No such file or directory
On Mon, 8 May 2000, Michael Robinson wrote:
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.The "gcore" command (on most modern unices) will generate a core dump of a
running process without killing the process. It seems that would be more
useful in this circumstance.-Michael Robinson
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Are we still releasing 7.0 tomorrow?
*sigh*
gcore 87721
gcore: /proc/87721/file: No such file or directory
On Mon, 8 May 2000, Michael Robinson wrote:
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.The "gcore" command (on most modern unices) will generate a core dump of a
running process without killing the process. It seems that would be more
useful in this circumstance.-Michael Robinson
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sun, 7 May 2000, Bruce Momjian wrote:
Are we still releasing 7.0 tomorrow?
I don't know ... this problem has me nervous, but I can't seem to
re-create it on the fly :( It happened twice so far today, and I'm
working on improving logging to see if I can narrow it down ...
I would like to *at least* postpone until Wednesday to see if I can
recreate this between now and then ... will spend a good part of tomorrow
seeing if I can get a more decent amount of data logged, to narrow her
down ...
We still have to write up a release announcement (can someone summarize
the key features of v7.0?), so that gives us a little bit of time ...
*sigh*
gcore 87721
gcore: /proc/87721/file: No such file or directory
On Mon, 8 May 2000, Michael Robinson wrote:
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.The "gcore" command (on most modern unices) will generate a core dump of a
running process without killing the process. It seems that would be more
useful in this circumstance.-Michael Robinson
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org-- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, 7 May 2000, Bruce Momjian wrote:
Are we still releasing 7.0 tomorrow?
I don't know ... this problem has me nervous, but I can't seem to
re-create it on the fly :( It happened twice so far today, and I'm
working on improving logging to see if I can narrow it down ...I would like to *at least* postpone until Wednesday to see if I can
recreate this between now and then ... will spend a good part of tomorrow
seeing if I can get a more decent amount of data logged, to narrow her
down ...
Isn't is something we can fix with a 7.0.1? Seems many people are
already using 7.0 in production systems. I just hate to see the date
slip again.
We still have to write up a release announcement (can someone summarize
the key features of v7.0?), so that gives us a little bit of time ...
Well, you can take it off the top of the HISTORY file.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 8 May 2000, Bruce Momjian wrote:
On Sun, 7 May 2000, Bruce Momjian wrote:
Are we still releasing 7.0 tomorrow?
I don't know ... this problem has me nervous, but I can't seem to
re-create it on the fly :( It happened twice so far today, and I'm
working on improving logging to see if I can narrow it down ...I would like to *at least* postpone until Wednesday to see if I can
recreate this between now and then ... will spend a good part of tomorrow
seeing if I can get a more decent amount of data logged, to narrow her
down ...Isn't is something we can fix with a 7.0.1? Seems many people are
already using 7.0 in production systems. I just hate to see the date
slip again.
As I said, if we feel comfortable with this, no probs ... its not an issue
I'm going to push, since it is something that I'm finding relativley
difficult to recreate "at will" :(
We still have to write up a release announcement (can someone summarize
the key features of v7.0?), so that gives us a little bit of time ...Well, you can take it off the top of the HISTORY file.
Great, will work this up tomorrow during the day :) Thanks ...
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Isn't is something we can fix with a 7.0.1? Seems many people are
already using 7.0 in production systems. I just hate to see the date
slip again.As I said, if we feel comfortable with this, no probs ... its not an issue
I'm going to push, since it is something that I'm finding relativley
difficult to recreate "at will" :(We still have to write up a release announcement (can someone summarize
the key features of v7.0?), so that gives us a little bit of time ...Well, you can take it off the top of the HISTORY file.
Great, will work this up tomorrow during the day :) Thanks ...
My feeling is that we can address this in 7.0.1, though our recent
pg_group fix could not be done in 7.0.1, but this doesn't seem like that
kind of problem. Such problems are usually easily reproducible because
they represent problems with the system catalogs.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Are we still releasing 7.0 tomorrow?
I don't know ... this problem has me nervous, but I can't seem to
re-create it on the fly :( It happened twice so far today, and I'm
working on improving logging to see if I can narrow it down ...I would like to *at least* postpone until Wednesday to see if I can
recreate this between now and then ... will spend a good part of tomorrow
seeing if I can get a more decent amount of data logged, to narrow her
down ...
Isn't is something we can fix with a 7.0.1? Seems many people are
already using 7.0 in production systems. I just hate to see the date
slip again.
That's my feeling too. Whatever this is, it seems to be in the
postmaster not the backend. We've hardly changed the postmaster since
6.5.3, so I suspect the problem has existed for a good while and is of
low probability. (I have no explanation why Marc's suddenly getting
bit, but if it weren't low-probability we'd surely have more reports
than just his, no?)
Almost certainly, we will need a 7.0.1 in a few weeks, once 7.0 gets out
there and starts getting pounded on by people outside the circle of
usual suspects (sorry, been watching _Casablanca_ again). If we delay
7.0 until we can figure out what this bug is all about, we might be
sitting on it for days or weeks. Let's push 7.0 out the door and let
some other work go on in parallel while we try to figure out this one.
Marc, if you see it happen again could you give me a call before you
restart? I'd like to telnet in and poke at it a little myself.
(Wait a sec, is this happening on hub, or somewhere else?)
regards, tom lane
On Mon, 8 May 2000, The Hermit Hacker wrote:
*sigh*
gcore 87721
gcore: /proc/87721/file: No such file or directory
Accroding to TFM:
The process identifier, pid, must be given on the command line. If no
executable image is specified, gcore will use ``/proc/<pid>/file''.
So you might try:
gcore /path_to_postmaster/postmaster 87721
or something close to that.
Vince.
On Mon, 8 May 2000, Michael Robinson wrote:
Try killing the postmaster itself in such a way as to produce a coredump
(kill -ABORT ought to do) and get a backtrace from that.The "gcore" command (on most modern unices) will generate a core dump of a
running process without killing the process. It seems that would be more
useful in this circumstance.-Michael Robinson
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================
On Mon, 8 May 2000, Tom Lane wrote:
Marc, if you see it happen again could you give me a call before you
restart? I'd like to telnet in and poke at it a little myself.
(Wait a sec, is this happening on hub, or somewhere else?)
We built a Dual-PIII server to handle just database server, so I can give
you access to it ...
Thus spake The Hermit Hacker
Marc, if you see it happen again could you give me a call before you
restart? I'd like to telnet in and poke at it a little myself.
(Wait a sec, is this happening on hub, or somewhere else?)We built a Dual-PIII server to handle just database server, so I can give
you access to it ...
Are you talking about the new database server for Trends? If so I should
mention that I had to restart it this morning. Sorry, I didn't poke
around in it before doing so. Clients couldn't log in and I couldn't wait.
I should mention that I did have to kill -9 it. A simple kill didn't work.
I then cleared out the lock file and restarted it and connections seem to
be working again.
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Mon, 8 May 2000, D'Arcy J.M. Cain wrote:
Thus spake The Hermit Hacker
Marc, if you see it happen again could you give me a call before you
restart? I'd like to telnet in and poke at it a little myself.
(Wait a sec, is this happening on hub, or somewhere else?)We built a Dual-PIII server to handle just database server, so I can give
you access to it ...Are you talking about the new database server for Trends? If so I
should mention that I had to restart it this morning. Sorry, I didn't
poke around in it before doing so. Clients couldn't log in and I
couldn't wait.I should mention that I did have to kill -9 it. A simple kill didn't
work. I then cleared out the lock file and restarted it and
connections seem to be working again.
That's the server ... and that's the key problem ... there are apps
running on here that are such that delaying the restart, when it requires
it, is very difficult :(
D'Arcy, when it happens again, and if you catch it before me, can you run:
gcore -s bin/postmaster <pid>
on it as the pgsql user before restarting it? I just tested it here and
it dump'd core nicely ... I'm hoping it does the same if/when the
postmaster itself hangs *cross fingers*
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Thus spake The Hermit Hacker
D'Arcy, when it happens again, and if you catch it before me, can you run:
gcore -s bin/postmaster <pid>
on it as the pgsql user before restarting it? I just tested it here and
it dump'd core nicely ... I'm hoping it does the same if/when the
postmaster itself hangs *cross fingers*
Will do.
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
The Hermit Hacker <scrappy@hub.org> writes:
We still have to write up a release announcement (can someone summarize
the key features of v7.0?), so that gives us a little bit of time ...
Man, there's a lot of stuff in the HISTORY file, isn't there?
The list at the top isn't too bad:
Foreign Keys
Foreign keys are now implemented, with the exception of PARTIAL
MATCH foreign keys. Many users have been asking for this
feature, and we are pleased to offer it.
Optimizer Overhaul
Continuing on work started a year ago, the optimizer has been
overhauled, allowing improved query execution and better
performance with less memory usage.
Updated psql
psql, our interactive terminal monitor, has been updated with a
variety of new features. See the psql manual page for details.
Upcoming Features
In 7.1, we plan to have outer joins, storage for very long
rows, and a write-ahead logging system.
Some other things that might be worth mentioning:
Date/time datatypes cleaned up
We have brought the date/time datatypes into compliance with
the SQL standard, replacing the old partially-implemented SQL
date/time types with full-featured implementations. The
default display format for date/time data has also changed
to be ISO style. This may create a few compatibility issues
for old applications. [Thomas may want to rewrite this item...]
Query length limits removed
There is no longer any fixed limit on the length of a query
string. (The block-size limit on the length of a stored row
still exists, but we hope to fix that in 7.1.)
Removal of 8-argument limit on index keys and functions
The maximum number of keys in an index or arguments to a
function is now configurable, with default limit of 16,
rather than the old hard-coded limit of 8.
Sorts and hashes now work for >2GB of data
Temporary files can now be split in the same way that oversize
relations are, so that data volume is only limited by
available disk space and not by OS limits on the size of an
individual file.
It wouldn't be hard to make this list a *lot* longer, but...
You should also make a point of the literally hundreds of smaller
features, bug fixes, and performance improvements that are in this
release.
regards, tom lane
Upcoming Features
In 7.1, we plan to have outer joins, storage for very long
rows, and a write-ahead logging system.
Oh BTW, *are* we still planning outer joins for 7.1? I thought the plan
was to push out the querytree redesign to 7.2, and try to have a fairly
short release cycle for 7.1 instead, with TOAST and WAL as the
centerpiece attractions.
regards, tom lane
Query length limits removed
There is no longer any fixed limit on the length of a query
string. (The block-size limit on the length of a stored row
still exists, but we hope to fix that in 7.1.)
Is the row length limit 8k? If not, what is the row length limit?
Thanks!
- Mitch
"Mitch Vincent" <mitch@huntsvilleal.com> writes:
Is the row length limit 8k? If not, what is the row length limit?
Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock
installation ...
regards, tom lane
On Mon, 8 May 2000, Mitch Vincent wrote:
Query length limits removed
There is no longer any fixed limit on the length of a query
string. (The block-size limit on the length of a stored row
still exists, but we hope to fix that in 7.1.)Is the row length limit 8k? If not, what is the row length limit?
Right now, the tuple length is still at 8k ... Jan's TOAST implementation
is designed to finally rid us of that as well ...
Mitch Vincent wrote:
Query length limits removed
There is no longer any fixed limit on the length of a query
string. (The block-size limit on the length of a stored row
still exists, but we hope to fix that in 7.1.)Is the row length limit 8k? If not, what is the row length limit?
8k by default, max is 32K if you recompile-
Show quoted text
Thanks!
- Mitch
Vince Vielhaber wrote:
On Mon, 8 May 2000, D'Arcy J.M. Cain wrote:
It would kind of have to be, wouldn't it, if the row it had to fit in
had that limit?BLOBs aren't. Or did I miss something somewhere? I've always understood
the text datatype to be simply a text version of a BLOB.
Not yet in Postgres
Not necessarily in Postgres, but elsewhere.
Maybe elsewere.
In postgres it will be a new kind of (B)LOB, different from current LOs.
Current LOs are again separate from TEXT even ODBC and JDBC use them for
other BLOB support.
---------------
Hannu
Import Notes
Reference msg id not found: Pine.BSF.4.21.0005081635350.25181-100000@paprika.michvhf.com | Resolved by subject fallback
On Mon, 8 May 2000, Tom Lane wrote:
"Mitch Vincent" <mitch@huntsvilleal.com> writes:
Is the row length limit 8k? If not, what is the row length limit?
Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock
installation ...
A text datatype isn't limited to that too, is it?
Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================
Thus spake Vince Vielhaber
On Mon, 8 May 2000, Tom Lane wrote:
"Mitch Vincent" <mitch@huntsvilleal.com> writes:
Is the row length limit 8k? If not, what is the row length limit?
Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock
installation ...A text datatype isn't limited to that too, is it?
It would kind of have to be, wouldn't it, if the row it had to fit in
had that limit?
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Mon, 8 May 2000, D'Arcy J.M. Cain wrote:
Thus spake Vince Vielhaber
On Mon, 8 May 2000, Tom Lane wrote:
"Mitch Vincent" <mitch@huntsvilleal.com> writes:
Is the row length limit 8k? If not, what is the row length limit?
Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock
installation ...A text datatype isn't limited to that too, is it?
It would kind of have to be, wouldn't it, if the row it had to fit in
had that limit?
BLOBs aren't. Or did I miss something somewhere? I've always understood
the text datatype to be simply a text version of a BLOB. Not necessarily
in Postgres, but elsewhere.
Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================
This brings me to another question. Hopefully there isn't a 8k (max 32k)
limit on TEXT fields -- I'll assume there isn't a limit on TEXT fields for
the purpose of this email..
What do you guys think of storing whole text files (normally stored in a
flat file) in the database for searching purposes? Would a search on an
indexed TEXT field be slow as mud?
I'll try it on my home machine for kicks, just wanted to get some
theoretical opinions...
Thanks!
- Mitch
"The only real failure is quitting."
Show quoted text
Is the row length limit 8k? If not, what is the row length limit?
8k by default, max is 32K if you recompile-
Mitch Vincent wrote:
This brings me to another question. Hopefully there isn't a 8k (max 32k)
limit on TEXT fields --
No, they currently just have to fit in a record ;)
They will be stored (optionally) separately in future (7.1)
What do you guys think of storing whole text files (normally stored in a
flat file) in the database for searching purposes? Would a search on an
indexed TEXT field be slow as mud?
Depends on search ;)
like "a%" may not be too slow (unless the indexes on text field will be
disallowed initially, as has been mentioned some times)
PG does not yet have a native full-text index. There is a suboptimal
implementation using triggers and extra tables in contrib.
----------
Hannu
Thus spake Vince Vielhaber
Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock
installation ...A text datatype isn't limited to that too, is it?
It would kind of have to be, wouldn't it, if the row it had to fit in
had that limit?BLOBs aren't. Or did I miss something somewhere? I've always understood
the text datatype to be simply a text version of a BLOB. Not necessarily
in Postgres, but elsewhere.
You mean text FILES, not datatype. There is a base type called text
which has to fit in the row so it is naturally limited to the row size.
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Upcoming Features
In 7.1, we plan to have outer joins, storage for very long
rows, and a write-ahead logging system.Oh BTW, *are* we still planning outer joins for 7.1? I thought the plan
was to push out the querytree redesign to 7.2, and try to have a fairly
short release cycle for 7.1 instead, with TOAST and WAL as the
centerpiece attractions.
Oops, you are right. At the time I wrote this, we were going to do a
normal period 7.1.
I have updated the HISTORY and release.sgml to say 7.1 _or_ 7.2.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Man, there's a lot of stuff in the HISTORY file, isn't there?
The list at the top isn't too bad:
Ack! I didn't realize that there was a plain text HISTORY file, since
it *should* come from the SGML sources. I had changed the wording, and
eliminated the prediction for features in the next release (that
should appear on the web site imho, not in the release docs).
Check the release notes (INSTALL and release.htm) for the latest
wording.
Let me see if I can get the HISTORY file replaced with something
fresh; however, it is not a show-stopper so if you've already done the
build don't worry about it.
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
Man, there's a lot of stuff in the HISTORY file, isn't there?
The list at the top isn't too bad:Ack! I didn't realize that there was a plain text HISTORY file, since
it *should* come from the SGML sources. I had changed the wording, and
eliminated the prediction for features in the next release (that
should appear on the web site imho, not in the release docs).
I have been changing both each time. History does not generate
directly from SGML because it needs to be one big file with proper
breaks between sections.
I left the prediction in because this is not a big-feature release, and
I wanted people to know what we were planning. This is the first
release where we have definate plans for new features in the next
release.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I left the prediction in because this is not a big-feature release, and
I wanted people to know what we were planning. This is the first
release where we have definate plans for new features in the next
release.
Seems most appropriate to put this info on the web site, where it is
less formal and more easily changed/updated/removed. We already could
be mentioning the TOAST work, etc etc as ongoing projects and outer
joins are in that category too.
Vince, is there a place where we could put this kind of stuff?
Somewhere in the developer's lounge area? Perhaps a summary page of
ongoing projects and then links to specific pages for each project
where necessary?
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
On Tue, 9 May 2000, Thomas Lockhart wrote:
I left the prediction in because this is not a big-feature release, and
I wanted people to know what we were planning. This is the first
release where we have definate plans for new features in the next
release.Seems most appropriate to put this info on the web site, where it is
less formal and more easily changed/updated/removed. We already could
be mentioning the TOAST work, etc etc as ongoing projects and outer
joins are in that category too.Vince, is there a place where we could put this kind of stuff?
Somewhere in the developer's lounge area? Perhaps a summary page of
ongoing projects and then links to specific pages for each project
where necessary?
Already is: http://www.Postgresql.org/projects/index.html Jan's
been maintaining it. The projects on that page aren't necessarily
planned for the next release tho (what's there now very well might
be but that's not the intent of the page), so we may want to have
a more specific list pointing there.
BTW, I'm currently waiting on some graphics (already have some) to
put the Developer's Corner and User's Lounge online. With some out
of town travel coming up I may not be able to get it online till
the beginning of June tho.
Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================