BUG #2419: could not reattach to shared memory

Started by Andy Malealmost 20 years ago8 messagesbugs
Jump to latest
#1Andy Male
andy@ubic.co.uk

The following bug has been logged online:

Bug reference: 2419
Logged by: Andy Male
Email address: andy@ubic.co.uk
PostgreSQL version: 8.1
Operating system: Windows 2003 Server (standard) SP1
Description: could not reattach to shared memory
Details:

FULL ERROR IN WINDOWS EVENT LOG -

The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found.
The local computer may not have the necessary registry information or
message DLL files to display messages from a remote computer. You may be
able to use the /AUXSOURCE= flag to retrieve this description; see Help and
Support for details. The following information is part of the event: FATAL:
could not reattach to shared memory (key=5432001, addr=01960000): Invalid
argument

There is no correspondng error in pg_log.

Once the error happens the database become unreponsive, current connections
stop responding and you cannot make new connections. Stopping and starting
the database removes the error and normal operation can continue.

The error happens intermittently (three or four times a day) and doesn't
seem to have a specific cause. The database is not heavily used and is
processing around 100 transactions per minute.

#2Bruce Momjian
bruce@momjian.us
In reply to: Andy Male (#1)
Re: BUG #2419: could not reattach to shared memory

I wish I had more to suggest to you. We are working on a few problems
with semaphore on Win2003 SP1, but nothing related to shared memory.

---------------------------------------------------------------------------

Andy Male wrote:

The following bug has been logged online:

Bug reference: 2419
Logged by: Andy Male
Email address: andy@ubic.co.uk
PostgreSQL version: 8.1
Operating system: Windows 2003 Server (standard) SP1
Description: could not reattach to shared memory
Details:

FULL ERROR IN WINDOWS EVENT LOG -

The description for Event ID ( 0 ) in Source ( PostgreSQL ) cannot be found.
The local computer may not have the necessary registry information or
message DLL files to display messages from a remote computer. You may be
able to use the /AUXSOURCE= flag to retrieve this description; see Help and
Support for details. The following information is part of the event: FATAL:
could not reattach to shared memory (key=5432001, addr=01960000): Invalid
argument

There is no correspondng error in pg_log.

Once the error happens the database become unreponsive, current connections
stop responding and you cannot make new connections. Stopping and starting
the database removes the error and normal operation can continue.

The error happens intermittently (three or four times a day) and doesn't
seem to have a specific cause. The database is not heavily used and is
processing around 100 transactions per minute.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Andy Male
andy@ubic.co.uk
In reply to: Bruce Momjian (#2)
Re: BUG #2419: could not reattach to shared memory

Hi,

Thanks for your response. I have enabled additional logging and have
reduced the connections (max_connections) to 50 to reduce the memory
overhead. There should not be more than 8 connections at once and I wanted
to rule out some issue where the client wasn't releasing the connections.

The error has reduced in frequency but still happens. The following log
shows that there was a problem writing to the log file due to permissions
problem. Obviously this is not the case because the log file was accessible
before and after the error occurs since it contains further logging about
the problem.

The subsequent message in the log suggests something is wrong with the
client application (or I may be misinterpreting this message) -

"This application has requested the Runtime to terminate it in an unusual
way. Please contact the application's support team for more information."

Then there are various other server error/warnings and a report of "possibly
corrupted shared memory" which is the original problem reported.

I am not sure what to do, I do not know how to debug the problem and
currently Postgres is un-useable in a production environment. I am
considering recoding the app to use an alternate database, but that is going
to be a fairly lengthy process just to prove the problem is with the
database rather than the client application. I can't actually see how a
client application could or should be able to cause a memory error in the
database. Any help would be most appreciated.

Thanks
Andy

****** LOG FILE SNIP ******

2006-05-07 23:44:19 10.10.12.100(4018)LOG: 00000: statement: EXECUTE
npgsqlportal1 [PREPARE: select * from
fn_Driver_Session_Updated($1::int8,$2::bool)]
2006-05-07 23:44:19 10.10.12.100(4018)LOCATION: exec_execute_message,
postgres.c:1718
2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log
file 0, segment 90 at offset 2998272, length 8192: Permission denied
2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT: writing block 75 of relation
1663/20632/100738
SQL statement "update tbl_query_ui_consumer_session_mapping set
ui_consumer_session_data_action_type_id = 3, client_received = false where
session_id = $1 and client_session_id = $2 "
PL/pgSQL function "fn_driver_session_updated" line 58 at SQL
statement
2006-05-07 23:44:20 10.10.12.100(4018)LOCATION: XLogWrite, xlog.c:1474
2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT: select * from
fn_Driver_Session_Updated($1::int8,$2::bool)

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.

2006-05-07 23:44:21 LOG: 00000: server process (PID 5924) was terminated by
signal 3

2006-05-07 23:44:21 LOCATION: LogChildExit, postmaster.c:2425

2006-05-07 23:44:21 LOG: 00000: terminating any other active server
processes

2006-05-07 23:44:21 LOCATION: HandleChildCrash, postmaster.c:2306

2006-05-07 23:44:21 10.10.10.100(2467)WARNING: 57P02: terminating
connection because of crash of another server process

2006-05-07 23:44:21 10.10.10.100(2467)DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

2006-05-07 23:44:21 10.10.10.100(2467)HINT: In a moment you should be able
to reconnect to the database and repeat your command.

2006-05-07 23:44:21 10.10.10.100(2467)LOCATION: quickdie, postgres.c:2103

2006-05-07 23:44:21 10.10.10.100(2466)WARNING: 57P02: terminating
connection because of crash of another server process

2006-05-07 23:44:21 10.10.10.100(2466)DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

2006-05-07 23:44:21 10.10.10.100(2466)HINT: In a moment you should be able
to reconnect to the database and repeat your command.

2006-05-07 23:44:21 10.10.10.100(2466)LOCATION: quickdie, postgres.c:2103

2006-05-07 23:44:21 10.10.12.100(4021)WARNING: 57P02: terminating
connection because of crash of another server process

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: 07 May 2006 00:16
To: Andy Male
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #2419: could not reattach to shared memory

I wish I had more to suggest to you. We are working on a few problems
with semaphore on Win2003 SP1, but nothing related to shared memory.

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andy Male (#3)
Re: BUG #2419: could not reattach to shared memory

Andy Male wrote:

Hi Andy,

This is your problem:

2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log
file 0, segment 90 at offset 2998272, length 8192: Permission denied
2006-05-07 23:44:20 10.10.12.100(4018)CONTEXT: writing block 75 of relation
1663/20632/100738
SQL statement "update tbl_query_ui_consumer_session_mapping set
ui_consumer_session_data_action_type_id = 3, client_received = false where
session_id = $1 and client_session_id = $2 "
PL/pgSQL function "fn_driver_session_updated" line 58 at SQL
statement
2006-05-07 23:44:20 10.10.12.100(4018)LOCATION: XLogWrite, xlog.c:1474
2006-05-07 23:44:20 10.10.12.100(4018)STATEMENT: select * from
fn_Driver_Session_Updated($1::int8,$2::bool)

The "Permission denied" message is a report Postgres is getting from the
operating system. Notice it is marked as PANIC -- an unrecoverable
error. What you should be investigating is why does the operating
system reject the writing of that file. It clearly is a database file;
try looking at the files named $PGDATA/base/20632/100738 or possibly
$PGDATA/pg_tblspc/1663/20632/100738. What permissions do those files
have? Who owns them? If it's not the user who runs the Postgres
processes, or they are not accesible to it, then something else in the
system changed that, which is what you need to figure out and disable.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#4)
Re: BUG #2419: could not reattach to shared memory

On Mon, 2006-05-08 at 08:31 -0400, Alvaro Herrera wrote:

2006-05-07 23:44:20 10.10.12.100(4018)PANIC: 42501: could not write to log
file 0, segment 90 at offset 2998272, length 8192: Permission denied

This is a pg_xlog error, so it looks like you have a whole-system issue,
not just isolated tables.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6Andy
andy@otelex.co.uk
In reply to: Simon Riggs (#5)
Re: BUG #2419: could not reattach to shared memory

Hi,

Thank you for your replies.

I accept that the "Permission denied" problem does suggest that the DB error
may be caused by the OS somehow.

There is no problem with the permissions/ownership of the files because the
Postgres account created and owns and those files; this rules out any sort
of security problem. It is possible that the file is inaccessible through
some other reason and that Postgres is merely reporting that it can't access
the file and, that it 'could' be caused by a permissions problem rather than
it 'is' a permissions problem. The error log information in this case isn't
really very useful since it doesn't accurately report the real cause of the
error. The OS doesn't report any other errors and there are no other
systems problem or file access problems; the only problem lies within the
database.

To try and reproduce the problem on another machine, I did a new install of
the same version of Postgres (8.1.3) and dump/restored the database onto
this new server. So far it has been running with the same load and activity
for almost 30 hours and the problem has not surfaced. In theory, Postgres
and the database are identical and therefore, the fact that it doesn't error
in the same way does confirm this is an OS problem (assuming the problem
doesn't occur at some point in the future). The two servers are identical
hardware and have the same version of OS, Windows Server 2003 SP1.

None of this helps me because I still have a production server on which I
can't run the database since I can't debug the error. Any suggestions?

Thank you for your assistance.

Andy

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy (#6)
Re: BUG #2419: could not reattach to shared memory

"Andy" <andy@otelex.co.uk> writes:

To try and reproduce the problem on another machine, I did a new install of
the same version of Postgres (8.1.3) and dump/restored the database onto
this new server. So far it has been running with the same load and activity
for almost 30 hours and the problem has not surfaced. In theory, Postgres
and the database are identical and therefore, the fact that it doesn't error
in the same way does confirm this is an OS problem (assuming the problem
doesn't occur at some point in the future). The two servers are identical
hardware and have the same version of OS, Windows Server 2003 SP1.

We've seen reports of intermittent permission failures on Windows being
caused by broken antivirus software. What security software have you
got on those machines, and does the failure go away if you remove it?

regards, tom lane

#8Andy
andy@otelex.co.uk
In reply to: Tom Lane (#7)
Re: BUG #2419: could not reattach to shared memory

I have removed the AV software and the problem doesn't go away, it does seem
to happen less frequently but is still there. It is the same error -

"FATAL: could not reattach to shared memory (key=5432001, addr=01960000):
Invalid argument"

The referenced key and address are always the same, is there some way we can
view the data being stored in the memory, or look at what data was stored in
the memory, and where, to see if the reference is valid?

Regards

Andy

"Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us> writes:

Show quoted text

We've seen reports of intermittent permission failures on Windows being

caused by broken antivirus software. What security software have you

got on those machines, and does the failure go away if you remove it? "