always forced restart after status 139?

Started by Jason Williamsabout 24 years ago7 messagesgeneral
Jump to latest
#1Jason Williams
jwilliams@wc-group.com

Hi all,

We are using Postgres 7.1 on RedHat Linux 7.1.

When calling a C function in a shared library (*.so), if you get a
segmentation fault (status 139), the log indicates that the database will
shut down and then restart in a few seconds.

My question is, does this always have to happen? Is postgres capable of
just logging the seg fault, but not affecting all the users on the database
by restarting?

Thanks,

Jason
jwilliams@wc-group.com
__________________________________________________
Expand your wireless world with Arkdom PLUS
http://www.arkdom.com/

#2Dominic J. Eidson
sauron@the-infinite.org
In reply to: Jason Williams (#1)
Re: always forced restart after status 139?

On Mon, 18 Mar 2002, Jason Williams wrote:

We are using Postgres 7.1 on RedHat Linux 7.1.

When calling a C function in a shared library (*.so), if you get a
segmentation fault (status 139), the log indicates that the database will
shut down and then restart in a few seconds.

My question is, does this always have to happen? Is postgres capable of
just logging the seg fault, but not affecting all the users on the database
by restarting?

Because (the nature of) a SIGSEGV, you can't trust any data remaining in
memory - what if the crash was caused by corrupt data in memory?

This is why PostgreSQL completely shuts down, and re-starts back up.

Allowing any part of PostgreSQL to continue (especially since there's data
in SHM that's important) would be a bad idea, since you have no idea who
caused the SIGSEGV.

--
Dominic J. Eidson
"Baruk Khazad! Khazad ai-menu!" - Gimli
-------------------------------------------------------------------------------
http://www.the-infinite.org/ http://www.the-infinite.org/~dominic/

#3Jason Williams
jwilliams@wc-group.com
In reply to: Dominic J. Eidson (#2)
Re: always forced restart after status 139?

Thanks Dominic.

Point taken and understood.

The problem is this: we are planning to make this database for a commercial
website that will handle financial transactions. What will happen if the
database receives a seg fault and another user of the database is in the
middle of submitting a "critical" update? I'm assuming it will rollback
gracefully?

Does anyone know what the exact behavior is in this situation?

Thanks,

Jason

-----Original Message-----
From: Dominic J. Eidson [mailto:sauron@the-infinite.org]
Sent: Monday, March 18, 2002 1:28 PM
To: Jason Williams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] always forced restart after status 139?

On Mon, 18 Mar 2002, Jason Williams wrote:

We are using Postgres 7.1 on RedHat Linux 7.1.

When calling a C function in a shared library (*.so), if you get a
segmentation fault (status 139), the log indicates that the database will
shut down and then restart in a few seconds.

My question is, does this always have to happen? Is postgres capable of
just logging the seg fault, but not affecting all the users on the

database

by restarting?

Because (the nature of) a SIGSEGV, you can't trust any data remaining in
memory - what if the crash was caused by corrupt data in memory?

This is why PostgreSQL completely shuts down, and re-starts back up.

Allowing any part of PostgreSQL to continue (especially since there's data
in SHM that's important) would be a bad idea, since you have no idea who
caused the SIGSEGV.

--
Dominic J. Eidson
"Baruk Khazad! Khazad ai-menu!" -
Gimli
----------------------------------------------------------------------------
---
http://www.the-infinite.org/
http://www.the-infinite.org/~dominic/

#4Dominic J. Eidson
sauron@the-infinite.org
In reply to: Jason Williams (#3)
Re: always forced restart after status 139?

On Mon, 18 Mar 2002, Jason Williams wrote:

Point taken and understood.

You might wanna fix those extensions so they don't crash, btw :)

The problem is this: we are planning to make this database for a commercial
website that will handle financial transactions. What will happen if the
database receives a seg fault and another user of the database is in the
middle of submitting a "critical" update? I'm assuming it will rollback
gracefully?

It should roll back to it's pre-transaction state.

--
Dominic J. Eidson
"Baruk Khazad! Khazad ai-menu!" - Gimli
-------------------------------------------------------------------------------
http://www.the-infinite.org/ http://www.the-infinite.org/~dominic/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jason Williams (#3)
Re: always forced restart after status 139?

"Jason Williams" <jwilliams@wc-group.com> writes:

The problem is this: we are planning to make this database for a commercial
website that will handle financial transactions. What will happen if the
database receives a seg fault and another user of the database is in the
middle of submitting a "critical" update? I'm assuming it will rollback
gracefully?

The seg fault as such is not a problem. What concerns me a tad is that
your buggy C extension may scribble on shared-memory disk buffers at
some point before it causes an outright crash. If corrupted data
manages to get written to disk before the backend crash and ensuing
restart, there's no guarantee we can clean it up. The odds of this are
probably not high (assuming you use conservatively-sized shared buffers,
rather than a large fraction of your address space as some here have
been known to suggest) ... but they're not zero.

I concur with Dominic: fix your extension *before* you put it in
production, not after. If you don't have confidence in your ability
to get the bugs out then maybe you shouldn't be writing C functions.
The interpreted PLs are a great deal safer.

regards, tom lane

#6Jason Williams
jwilliams@wc-group.com
In reply to: Tom Lane (#5)
Re: always forced restart after status 139?

Did not mean to give the impression I was going to put a buggy extension
into production. Relax guys. I've fixed the bug that was causing the
initial seg fault. Just trying to cover all the bases here and understand
what we need to do in our front end "if" we've overlooked something in one
of our extensions. Thanks for the help and clarifications.

Jason

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, March 18, 2002 2:47 PM
To: Jason Williams
Cc: Dominic J. Eidson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] always forced restart after status 139?

"Jason Williams" <jwilliams@wc-group.com> writes:

The problem is this: we are planning to make this database for a

commercial

website that will handle financial transactions. What will happen if the
database receives a seg fault and another user of the database is in the
middle of submitting a "critical" update? I'm assuming it will rollback
gracefully?

The seg fault as such is not a problem. What concerns me a tad is that
your buggy C extension may scribble on shared-memory disk buffers at
some point before it causes an outright crash. If corrupted data
manages to get written to disk before the backend crash and ensuing
restart, there's no guarantee we can clean it up. The odds of this are
probably not high (assuming you use conservatively-sized shared buffers,
rather than a large fraction of your address space as some here have
been known to suggest) ... but they're not zero.

I concur with Dominic: fix your extension *before* you put it in
production, not after. If you don't have confidence in your ability
to get the bugs out then maybe you shouldn't be writing C functions.
The interpreted PLs are a great deal safer.

regards, tom lane

#7Jan Wieck
JanWieck@Yahoo.com
In reply to: Jason Williams (#3)
Re: always forced restart after status 139?

Jason Williams wrote:

Thanks Dominic.

Point taken and understood.

The problem is this: we are planning to make this database for a commercial
website that will handle financial transactions. What will happen if the
database receives a seg fault and another user of the database is in the
middle of submitting a "critical" update? I'm assuming it will rollback
gracefully?

First of all, you don't allow development work on the same
system your production runs on. Doing so implies that the
data is not critical to you.

Does anyone know what the exact behavior is in this situation?

Nobody can tell for sure. In almost all cases, yes, the
rollback would be gracefully. But the fault could've
corrupted the stack of the failing backend, causing it to
execute arbitrary code. How does someone predict what
arbitrary code will do?

Jan

Thanks,

Jason

-----Original Message-----
From: Dominic J. Eidson [mailto:sauron@the-infinite.org]
Sent: Monday, March 18, 2002 1:28 PM
To: Jason Williams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] always forced restart after status 139?

On Mon, 18 Mar 2002, Jason Williams wrote:

We are using Postgres 7.1 on RedHat Linux 7.1.

When calling a C function in a shared library (*.so), if you get a
segmentation fault (status 139), the log indicates that the database will
shut down and then restart in a few seconds.

My question is, does this always have to happen? Is postgres capable of
just logging the seg fault, but not affecting all the users on the

database

by restarting?

Because (the nature of) a SIGSEGV, you can't trust any data remaining in
memory - what if the crash was caused by corrupt data in memory?

This is why PostgreSQL completely shuts down, and re-starts back up.

Allowing any part of PostgreSQL to continue (especially since there's data
in SHM that's important) would be a bad idea, since you have no idea who
caused the SIGSEGV.

--
Dominic J. Eidson
"Baruk Khazad! Khazad ai-menu!" -
Gimli
----------------------------------------------------------------------------
---
http://www.the-infinite.org/
http://www.the-infinite.org/~dominic/

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com