Postgres 9.2.8 crash sporadically on Windows

Started by Sofer, Yuvalabout 12 years ago10 messagesbugs
Jump to latest
#1Sofer, Yuval
Yuval_Sofer@bmc.com

Hi,

Postgres server (9.2.8) crash on Windows sporadically

Usually it happens after machine reboot that we do, once in a day

After ~4 minutes it crashes with this error in the log:

2014-04-06 08:08:01.069 GMTLOG: server process (PID 5304) exited with exit code 0
2014-04-06 08:08:01.069 GMTLOG: terminating any other active server processes
2014-04-06 08:08:01.833 GMTLOG: all server processes terminated; reinitializing
2014-04-06 08:08:11.183 GMTFATAL: pre-existing shared memory block is still in use
2014-04-06 08:08:11.183 GMTHINT: Check if there are any old server processes still running, and terminate them.

This is production, Please help!

Yuval Sofer
BMC Software
CTM&D Business Unit
DBA Team
972-52-4286-282
yuval_sofer@bmc.com<mailto:yuval_sofer@bmc.com>

#2Andres Freund
andres@anarazel.de
In reply to: Sofer, Yuval (#1)
Re: Postgres 9.2.8 crash sporadically on Windows

Hello,

On 2014-04-07 09:49:09 -0500, Sofer, Yuval wrote:

Postgres server (9.2.8) crash on Windows sporadically

Usually it happens after machine reboot that we do, once in a day

After ~4 minutes it crashes with this error in the log:

2014-04-06 08:08:01.069 GMTLOG: server process (PID 5304) exited with exit code 0
2014-04-06 08:08:01.069 GMTLOG: terminating any other active server processes
2014-04-06 08:08:01.833 GMTLOG: all server processes terminated; reinitializing
2014-04-06 08:08:11.183 GMTFATAL: pre-existing shared memory block is still in use
2014-04-06 08:08:11.183 GMTHINT: Check if there are any old server processes still running, and terminate them.

This is production, Please help!

This is unfortunately not giving us many details to work with...

Are you using any self written functions? Any extensions? What's the
last query executed before the crash?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Postgres 9.2.8 crash sporadically on Windows

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-07 09:49:09 -0500, Sofer, Yuval wrote:

Postgres server (9.2.8) crash on Windows sporadically

2014-04-06 08:08:01.069 GMTLOG: server process (PID 5304) exited with exit code 0
2014-04-06 08:08:01.069 GMTLOG: terminating any other active server processes
2014-04-06 08:08:01.833 GMTLOG: all server processes terminated; reinitializing
2014-04-06 08:08:11.183 GMTFATAL: pre-existing shared memory block is still in use
2014-04-06 08:08:11.183 GMTHINT: Check if there are any old server processes still running, and terminate them.

This is production, Please help!

This is unfortunately not giving us many details to work with...

The "exit code 0" bit reminds me of this:

/messages/by-id/21027.1393546453@sss.pgh.pa.us

Is there another exit report for the same PID just above the quoted log
extract? Because if there isn't, the postmaster shouldn't be thinking
that status 0 represents a crash.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Postgres 9.2.8 crash sporadically on Windows

On 2014-04-07 11:15:32 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-07 09:49:09 -0500, Sofer, Yuval wrote:

Postgres server (9.2.8) crash on Windows sporadically

2014-04-06 08:08:01.069 GMTLOG: server process (PID 5304) exited with exit code 0
2014-04-06 08:08:01.069 GMTLOG: terminating any other active server processes
2014-04-06 08:08:01.833 GMTLOG: all server processes terminated; reinitializing
2014-04-06 08:08:11.183 GMTFATAL: pre-existing shared memory block is still in use
2014-04-06 08:08:11.183 GMTHINT: Check if there are any old server processes still running, and terminate them.

This is production, Please help!

This is unfortunately not giving us many details to work with...

The "exit code 0" bit reminds me of this:

/messages/by-id/21027.1393546453@sss.pgh.pa.us

Is there another exit report for the same PID just above the quoted log
extract? Because if there isn't, the postmaster shouldn't be thinking
that status 0 represents a crash.

Wouldn't a pl or library doing a exit(0); cause exactly that, because it
will still be in PM_CHILD_ASSIGNED?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Postgres 9.2.8 crash sporadically on Windows

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-07 11:15:32 -0400, Tom Lane wrote:

The "exit code 0" bit reminds me of this:

/messages/by-id/21027.1393546453@sss.pgh.pa.us

Is there another exit report for the same PID just above the quoted log
extract? Because if there isn't, the postmaster shouldn't be thinking
that status 0 represents a crash.

Wouldn't a pl or library doing a exit(0); cause exactly that, because it
will still be in PM_CHILD_ASSIGNED?

Oh, hm, that's a possibility, if the OP is using any untrusted PLs.
As you say, we lack sufficient context.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Sofer, Yuval
Yuval_Sofer@bmc.com
In reply to: Tom Lane (#5)
Re: Postgres 9.2.8 crash sporadically on Windows

Hi,

We use several dblink functions (CREATE EXTENSION dblink), but only once and during installation of Postgres database server. The crash happens few minutes later.

Also, we use one homemade function, which is being activated more often (to get disk usage information, using Windows API).
Anyway, I don't think it is the problem - whenever I activate it, I get the expected results.
I activated it for several times - Postgres didn't reports anything unusual.

Is there another exit report for the same PID just above the quoted log extract?

No error reported above this message

Let me know if you need more info. I can send you any c code or the PL functions which using these contributions.

Thanks,
Yuval

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 07, 2014 6:39 PM
To: Andres Freund
Cc: Sofer, Yuval; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres 9.2.8 crash sporadically on Windows

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-04-07 11:15:32 -0400, Tom Lane wrote:

The "exit code 0" bit reminds me of this:

/messages/by-id/21027.1393546453@sss.pgh.pa.us

Is there another exit report for the same PID just above the quoted
log extract? Because if there isn't, the postmaster shouldn't be
thinking that status 0 represents a crash.

Wouldn't a pl or library doing a exit(0); cause exactly that, because
it will still be in PM_CHILD_ASSIGNED?

Oh, hm, that's a possibility, if the OP is using any untrusted PLs.
As you say, we lack sufficient context.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andres Freund
andres@anarazel.de
In reply to: Sofer, Yuval (#6)
Re: Postgres 9.2.8 crash sporadically on Windows

On 2014-04-08 05:12:16 -0500, Sofer, Yuval wrote:

Also, we use one homemade function, which is being activated more often (to get disk usage information, using Windows API).
Anyway, I don't think it is the problem - whenever I activate it, I get the expected results.
I activated it for several times - Postgres didn't reports anything unusual.

I still would bet it's related to that.

Is there another exit report for the same PID just above the quoted log extract?

No error reported above this message

It's not really the error, but the statements I am interested in.

Let me know if you need more info. I can send you any c code or the PL functions which using these contributions.

That would be useful.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Sofer, Yuval
Yuval_Sofer@bmc.com
In reply to: Andres Freund (#7)
Re: Postgres 9.2.8 crash sporadically on Windows

Hi,

code is attached below - c and h file , as well as the sql to create the function in postgres repository.

by the way, no one touched this code for several years (we first used it in PG 8.2.4)

dbutils.h:

/*

* dbutils.h

*

*/

#ifndef DBUTILS_H

#define DBUTILS_H

#include "fmgr.h"

/*

* External declarations

*/

extern Datum dbutils_get_diskinfo(PG_FUNCTION_ARGS);

#endif /* DBUTILS_H */

dbutils.c:

/*

* dbutils.c

*

*/

#include "postgres.h"

#include "fmgr.h"

#include "utils/builtins.h"

#include "dbutils.h"

#ifndef WIN32

#include <sys/types.h>

#include <sys/statvfs.h>

#endif

PG_MODULE_MAGIC;

/* general utility */

#define GET_TEXT(cstrp) DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(cstrp)))

#define GET_STR(textp) DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp)))

#define DBUTILS_DISK_INFO_TOTAL 1

#define DBUTILS_DISK_INFO_FREE 2

/*

* Getting disk information from a drive/directory

*/

PG_FUNCTION_INFO_V1(dbutils_get_diskinfo);

Datum

dbutils_get_diskinfo(PG_FUNCTION_ARGS)

{

char *pPathName = NULL;

int property;

char msg[1024];

#ifdef WIN32

BOOL fResult = FALSE;

__int64 i64FreeBytesToCaller;

__int64 i64TotalBytes;

__int64 i64FreeBytes;

__int64 i64Size = -1;

#else

int fResult = -1;

long long llSize = -1;

u_long ulBlockSize = 0;

struct statvfs st;

#endif

if (PG_NARGS() == 2)

{

property = PG_GETARG_INT32(0);

pPathName = GET_STR(PG_GETARG_TEXT_P(1));

}

else

{

msg[0] = '\0';

ereport(ERROR,

(errcode(ERRCODE_RAISE_EXCEPTION),

errmsg("Missing function arguments"),

errdetail("%s", msg)));

PG_RETURN_INT64(-1);

}

#ifdef WIN32

fResult = GetDiskFreeSpaceEx(pPathName,

(PULARGE_INTEGER)&i64FreeBytesToCaller,

(PULARGE_INTEGER)&i64TotalBytes,

(PULARGE_INTEGER)&i64FreeBytes);

if (fResult)

{

switch (property)

{

case DBUTILS_DISK_INFO_TOTAL:

i64Size = i64TotalBytes;

break;

case DBUTILS_DISK_INFO_FREE:

i64Size = i64FreeBytesToCaller;

break;

default:

msg[0] = '\0';

ereport(ERROR,

(errcode(ERRCODE_RAISE_EXCEPTION),

errmsg("Unknown disk info property"),

errdetail("%s", msg)));

}

PG_RETURN_INT64(i64Size);

}

else

{

sprintf(msg, "GetDiskFreeSpaceEx failed with error code: %ld", GetLastError());

ereport(ERROR,

(errcode(ERRCODE_RAISE_EXCEPTION),

errmsg("Unable to get disk info"),

errdetail("%s", msg)));

PG_RETURN_INT64(-1);

}

#else

fResult = statvfs(pPathName, &st);

if(fResult == 0)

{

/* f_frsize is sometimes empty so use f_bsize instead */

ulBlockSize = st.f_frsize ? st.f_frsize : st.f_bsize;

/* Note: in Linux st.f_frsize is larger than the file system block size and */

/* therefore the calculated free space is lower than the actual free space. */

switch (property)

{

case DBUTILS_DISK_INFO_TOTAL:

llSize = (long long)ulBlockSize * (long long)st.f_blocks;

break;

case DBUTILS_DISK_INFO_FREE:

llSize = (long long)ulBlockSize * (long long)st.f_bavail;

break;

default:

msg[0] = '\0';

ereport(ERROR,

(errcode(ERRCODE_RAISE_EXCEPTION),

errmsg("Unknown disk info property"),

errdetail("%s", msg)));

}

PG_RETURN_INT64(llSize);

}

else

{

sprintf(msg, "statvfs failed with error code: %d, message: %s ", errno, strerror(errno));

ereport(ERROR,

(errcode(ERRCODE_RAISE_EXCEPTION),

errmsg("Unable to get disk info"),

errdetail("%s", msg)));

PG_RETURN_INT64(-1);

}

#endif

PG_RETURN_INT64(-1);

}

dbutils.sql:

CREATE OR REPLACE FUNCTION dbutils_get_diskinfo (int, text)

RETURNS int8

AS 'MODULE_PATHNAME','dbutils_get_diskinfo'

LANGUAGE C STRICT;

[cid:image001.png@01CF5339.089C0E40]

Thanks,

Yuval

-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Tuesday, April 08, 2014 1:15 PM
To: Sofer, Yuval
Cc: Tom Lane; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres 9.2.8 crash sporadically on Windows

On 2014-04-08 05:12:16 -0500, Sofer, Yuval wrote:

Also, we use one homemade function, which is being activated more often (to get disk usage information, using Windows API).

Anyway, I don't think it is the problem - whenever I activate it, I get the expected results.

I activated it for several times - Postgres didn't reports anything unusual.

I still would bet it's related to that.

Is there another exit report for the same PID just above the quoted log extract?

No error reported above this message

It's not really the error, but the statements I am interested in.

Let me know if you need more info. I can send you any c code or the PL functions which using these contributions.

That would be useful.

Greetings,

Andres Freund

--

Andres Freund http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

image001.pngimage/png; name=image001.pngDownload+1-8
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sofer, Yuval (#8)
Re: Postgres 9.2.8 crash sporadically on Windows

"Sofer, Yuval" <Yuval_Sofer@bmc.com> writes:

code is attached below - c and h file , as well as the sql to create the function in postgres repository.

Hm ... no obvious way that that could cause an exit(0). I didn't inspect
it closely for garden-variety bugs, but if it were crashing in an ordinary
way the symptoms would be different from what you're reporting.

There's always the catch-all explanation for random weirdness on Windows
machines: do you have any antivirus software installed? If so, does
removing it make the problem go away?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Sofer, Yuval
Yuval_Sofer@bmc.com
In reply to: Tom Lane (#9)
Re: Postgres 9.2.8 crash sporadically on Windows

Hi,

We restored database to different machine and for now, no crashes...

I hope it is something related to the environment (anti-virus , Windows memory etc)

Tom, Andres - thanks a lot for your quick response

Yuval

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, April 08, 2014 4:46 PM
To: Sofer, Yuval
Cc: Andres Freund; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres 9.2.8 crash sporadically on Windows

"Sofer, Yuval" <Yuval_Sofer@bmc.com> writes:

code is attached below - c and h file , as well as the sql to create the function in postgres repository.

Hm ... no obvious way that that could cause an exit(0). I didn't inspect it closely for garden-variety bugs, but if it were crashing in an ordinary way the symptoms would be different from what you're reporting.

There's always the catch-all explanation for random weirdness on Windows
machines: do you have any antivirus software installed? If so, does removing it make the problem go away?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs