FATAL: could not reattach to shared memory (Win32)

Started by Terry Yaptover 18 years ago30 messagesgeneral
Jump to latest
#1Terry Yapt
yapt@technovell.com

Hello all,

I am having problems with the next postgresql version:

pg version: 8.2.4
OS: Win32 (windows xp sp2)
FS: NTFS

It is a production server, but suddenly the DB stop answering to any sql
command. It seems dead. After restart server all starts to works again.

I am looking for system errors and nothing is there. But I have a lot
of messages on system APP errors. The error is the same every ten
seconds or so.

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001,
addr=01D80000): Invalid argument

It is always followed by this another system-app error:
* LOG: unrecognized win32 error code: 487

I have found this on my intensive internet search:
http://archives.postgresql.org/pgsql-bugs/2007-01/msg00032.php

I need to solve this ASAP. Anybody have any idea about this ?

Thanks.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Terry Yapt (#1)
Re: FATAL: could not reattach to shared memory (Win32)

Terry Yapt wrote:

I am looking for system errors and nothing is there. But I have a lot of
messages on system APP errors. The error is the same every ten seconds or
so.

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001, addr=01D80000):
Invalid argument

Please run "ipcs" on a command line window and paste the results.

I see a minor problem in that code: we are invoking two system calls
(shmget and shmat) but the log does not say which one failed. However
in this case it seems only shmget could be returning EINVAL.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Terry Yapt
yapt@technovell.com
In reply to: Alvaro Herrera (#2)
Re: FATAL: could not reattach to shared memory (Win32)

Sorry, I have not be able to execute "ipcs" on windows. it doesn't
exists. I have tried to find some utility that gives me the same
information or any ipcs porting to win32, but I haven't had any luck.

If I can do something more to get help, please tell me.

Greetings.

Alvaro Herrera escribi�:

Show quoted text

Terry Yapt wrote:

I am looking for system errors and nothing is there. But I have a lot of
messages on system APP errors. The error is the same every ten seconds or
so.

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001, addr=01D80000):
Invalid argument

Please run "ipcs" on a command line window and paste the results.

I see a minor problem in that code: we are invoking two system calls
(shmget and shmat) but the log does not say which one failed. However
in this case it seems only shmget could be returning EINVAL.

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Terry Yapt (#1)
Re: FATAL: could not reattach to shared memory (Win32)

Terry Yapt wrote:

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001, addr=01D80000):
Invalid argument

It is always followed by this another system-app error:
* LOG: unrecognized win32 error code: 487

FWIW,
http://help.netop.com/support/errorcodes/win32_error_codes.htm

says
487 Attempt to access invalid address. ERROR_INVALID_ADDRESS

This problem has been reported before, for example in

http://bbs.chinaunix.net/thread-973003-1-1.html
(not that I can read it very well)

and

http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html

No resolution seems to have been found.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Terry Yapt
yapt@technovell.com
In reply to: Alvaro Herrera (#4)
Re: FATAL: could not reattach to shared memory (Win32)

Alvaro Herrera escribi�:

Terry Yapt wrote:

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001, addr=01D80000):
Invalid argument

It is always followed by this another system-app error:
* LOG: unrecognized win32 error code: 487

This problem has been reported before, for example in

http://bbs.chinaunix.net/thread-973003-1-1.html
(not that I can read it very well)

and

http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html

Yes, those are the same than here:
http://archives.postgresql.org/pgsql-bugs/2007-01/msg00032.php

No resolution seems to have been found.

Then, I am very worried now. :-|

Thanks Alvaro.

#6Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#4)
Re: FATAL: could not reattach to shared memory (Win32)

Alvaro Herrera wrote:

Terry Yapt wrote:

This is the main error:
* FATAL: could not reattach to shared memory (key=5432001, addr=01D80000):
Invalid argument

It is always followed by this another system-app error:
* LOG: unrecognized win32 error code: 487

FWIW,
http://help.netop.com/support/errorcodes/win32_error_codes.htm

says
487 Attempt to access invalid address. ERROR_INVALID_ADDRESS

This problem has been reported before, for example in

http://bbs.chinaunix.net/thread-973003-1-1.html
(not that I can read it very well)

and

http://lists.pgfoundry.org/pipermail/brasil-usuarios/20061127/003150.html

No resolution seems to have been found.

8.3 will have a new way to deal with shared mem on win32. It's the same
underlying tech, but we're no longer trying to squeeze it into an
emulation of sysv. With a bit of luck, that'll help :-)

//Magnus

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#6)
Re: FATAL: could not reattach to shared memory (Win32)

Magnus Hagander wrote:

Alvaro Herrera wrote:

No resolution seems to have been found.

8.3 will have a new way to deal with shared mem on win32. It's the same
underlying tech, but we're no longer trying to squeeze it into an
emulation of sysv. With a bit of luck, that'll help :-)

So you're saying we won't fix this bug in 8.2? That seems unfortunate,
given that 8.2 is still supposed to be supported on Windows.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Shelby Cain
alyandon@yahoo.com
In reply to: Alvaro Herrera (#7)
Re: FATAL: could not reattach to shared memory (Win32)

----- Original Message ----
From: Magnus Hagander <magnus@hagander.net>
To: Alvaro Herrera <alvherre@commandprompt.com>
Cc: Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
Sent: Thursday, August 23, 2007 3:43:32 PM
Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)

8.3 will have a new way to deal with shared mem on win32. It's the same
underlying tech, but we're no longer trying to squeeze it into an
emulation of sysv. With a bit of luck, that'll help :-)

//Magnus

Wild guess on my part... could that error be the result of an attempt to map shared memory into a process at a fixed location that just happens to already be occupied by a dll that Windows had decided to relocate?

Regards,

Shelby Cain

____________________________________________________________________________________
Pinpoint customers who are looking for what you sell.
http://searchmarketing.yahoo.com/

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: FATAL: could not reattach to shared memory (Win32)

Alvaro Herrera <alvherre@commandprompt.com> writes:

Magnus Hagander wrote:

8.3 will have a new way to deal with shared mem on win32. It's the same
underlying tech, but we're no longer trying to squeeze it into an
emulation of sysv. With a bit of luck, that'll help :-)

So you're saying we won't fix this bug in 8.2?

Well, we certainly aren't going to back-patch a major rewrite that
(1) hasn't made it through beta testing, and (2) is not actually known
to fix the bug. When and if those gating conditions stop being true,
maybe we could consider a back-patch.

But at the moment this is all speculation ... I counsel concentrating
on finding out what's really happening on Terry's machine, before trying
to guess whether we already have a fix written.

regards, tom lane

#10Magnus Hagander
magnus@hagander.net
In reply to: Shelby Cain (#8)
Re: FATAL: could not reattach to shared memory (Win32)

Shelby Cain wrote:

----- Original Message ---- From: Magnus Hagander
<magnus@hagander.net> To: Alvaro Herrera
<alvherre@commandprompt.com> Cc: Terry Yapt <yapt@technovell.com>;
pgsql-general@postgresql.org Sent: Thursday, August 23, 2007
3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
shared memory (Win32)

8.3 will have a new way to deal with shared mem on win32. It's the
same underlying tech, but we're no longer trying to squeeze it into
an emulation of sysv. With a bit of luck, that'll help :-)

//Magnus

Wild guess on my part... could that error be the result of an attempt
to map shared memory into a process at a fixed location that just
happens to already be occupied by a dll that Windows had decided to
relocate?

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

//Magnus

#11Trevor Talbot
quension@gmail.com
In reply to: Magnus Hagander (#10)
Re: FATAL: could not reattach to shared memory (Win32)

On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:

Shelby Cain wrote:

Wild guess on my part... could that error be the result of an attempt
to map shared memory into a process at a fixed location that just
happens to already be occupied by a dll that Windows had decided to
relocate?

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

Not a valid assumption; you can't rely on consistent VM space among
multiple [non-cloned] processes without a serious amount of effort.
Anything can use that space, it's not just file views. Obviously it
happens to work some of the time, but when it doesn't, it doesn't. I
gather postgres depends on it being at the same address, and fixing
that isn't trivial?

If everything relevant is going through the intriguing
internal_forkexec(), you could probably reserve address space there
before resuming the thread. You'd want to combine this with picking
address space that's less likely to be used before creating the shared
memory section. (Actually, if you're doing that, you might as well
just inject the backend variables too instead of going through the
mapped file gymnastics.)

Not a simple change, but would likely make this particular problem go
away (assuming this is the problem). It's also the first time I've
looked at the source, so perhaps I missed something.

#12Bruce Momjian
bruce@momjian.us
In reply to: Trevor Talbot (#11)
Re: FATAL: could not reattach to shared memory (Win32)

"Trevor Talbot" <quension@gmail.com> writes:

I gather postgres depends on it being at the same address, and fixing that
isn't trivial?

I haven't been following the rest of the thread so I'm not sure if this is
important. But no, fixing that should be relatively trivial as there are
already some configurations where it's not the case (the EXEC_BACKEND case I
believe). The rest of the system uses a shared memory base pointer and
references everything relative to that.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#13Bruce Momjian
bruce@momjian.us
In reply to: Trevor Talbot (#11)
Re: FATAL: could not reattach to shared memory (Win32)

Trevor Talbot wrote:

On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:

Shelby Cain wrote:

Wild guess on my part... could that error be the result of an attempt
to map shared memory into a process at a fixed location that just
happens to already be occupied by a dll that Windows had decided to
relocate?

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

Not a valid assumption; you can't rely on consistent VM space among
multiple [non-cloned] processes without a serious amount of effort.
Anything can use that space, it's not just file views. Obviously it
happens to work some of the time, but when it doesn't, it doesn't. I
gather postgres depends on it being at the same address, and fixing
that isn't trivial?

If everything relevant is going through the intriguing
internal_forkexec(), you could probably reserve address space there
before resuming the thread. You'd want to combine this with picking
address space that's less likely to be used before creating the shared
memory section. (Actually, if you're doing that, you might as well
just inject the backend variables too instead of going through the
mapped file gymnastics.)

Not a simple change, but would likely make this particular problem go
away (assuming this is the problem). It's also the first time I've
looked at the source, so perhaps I missed something.

I think this is accurate. When we created the Win32 native port there
was a lot of concern about how to handle shared memory in a BACKEND_EXEC
case, namely that postmaster children were not copies which had the same
shared memory mappings, but rather were new processes that had to attach
to shared memory at a fixed address.

The WIN32 solution was to create the shared memory in the parent, and
then pass that address value down to the children to use in attaching to
the existing segment. We expected all sorts of problems with this but
in fact it seemed to work fine (most of the time).

As you can see it doesn't work 100% of the time, but it worked more
reliabily than we expected. What we have been waiting for is someone
who can recreate a failure so we can track down how to best make it 100%
reliable, and as you can see, we haven't had a flood of problem reports
to track this down.

If you want to help make it 100% we will work with you to find the
solution.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#12)
Re: FATAL: could not reattach to shared memory (Win32)

Gregory Stark wrote:

"Trevor Talbot" <quension@gmail.com> writes:

I gather postgres depends on it being at the same address, and fixing that
isn't trivial?

I haven't been following the rest of the thread so I'm not sure if this is
important. But no, fixing that should be relatively trivial as there are
already some configurations where it's not the case (the EXEC_BACKEND case I
believe). The rest of the system uses a shared memory base pointer and
references everything relative to that.

This is inaccurate, I believe. The original Berkeley code did exec()
for backends and hence allowed shared memory to be at different
addresses for different backends, but we started using fork() and
eliminated much of that capability for performance and clarify reasons,
so right now all backends have to have shared memory at the same
address, and changing this will not be simple.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Trevor Talbot (#11)
Re: FATAL: could not reattach to shared memory (Win32)

"Trevor Talbot" <quension@gmail.com> writes:

On 8/23/07, Magnus Hagander <magnus@hagander.net> wrote:

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

Not a valid assumption; you can't rely on consistent VM space among
multiple [non-cloned] processes without a serious amount of effort.

I'm not sure if you have a specific technical meaning of "clone" in mind
here, but these processes are all executing the identical executable,
and taking care to map the shmem early in execution *before* they load
any DLLs. So it should work. Apparently, it *does* work for awhile for
the OP, and then stops working, which is even odder.

I gather postgres depends on it being at the same address, and fixing
that isn't trivial?

That's correct, and not having to change it is not negotiable ---
finding a way to make this work was one of the gating factors that
made it practical to have a Windows port at all.

If you've got a specific suggestion for making it more reliable,
we're all ears.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: FATAL: could not reattach to shared memory (Win32)

Gregory Stark <stark@enterprisedb.com> writes:

"Trevor Talbot" <quension@gmail.com> writes:

I gather postgres depends on it being at the same address, and fixing that
isn't trivial?

I haven't been following the rest of the thread so I'm not sure if this is
important. But no, fixing that should be relatively trivial as there are
already some configurations where it's not the case (the EXEC_BACKEND case I
believe). The rest of the system uses a shared memory base pointer and
references everything relative to that.

That hasn't been the case for quite a few years, and we're not going back.
The pointer-to-offset-and-back gymnastics that that required were
utterly destructive to code readability and maintainability, mainly
because if everything stored in shmem data structures is an "offset"
then you can't get any useful error checking from the compiler about how
you are using the fields. It's like decreeing that every pointer
must be declared "void *" and cast to something else when it's used.

There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
but I think it's mostly just that no one's bothered to rewrite the code
for SHM_QUEUE linked lists. The vast majority of our shmem structures
use regular pointers, and have for years.

regards, tom lane

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#16)
Re: FATAL: could not reattach to shared memory (Win32)

Tom Lane escribi�:

There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
but I think it's mostly just that no one's bothered to rewrite the code
for SHM_QUEUE linked lists. The vast majority of our shmem structures
use regular pointers, and have for years.

... except that, not knowing that, I wrote part of the new autovac code
using MAKE_PTR/OFFSET, and it needs to be rewritten eventually :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#18Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#16)
Re: FATAL: could not reattach to shared memory (Win32)

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
but I think it's mostly just that no one's bothered to rewrite the code
for SHM_QUEUE linked lists. The vast majority of our shmem structures
use regular pointers, and have for years.

Ah, I happened to be recently in that code so I was mislead.

So even in EXEC_BACKEND we require that we can attach to the shared memory at
a specified location. hm.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#19Shelby Cain
alyandon@yahoo.com
In reply to: Bruce Momjian (#18)
Re: FATAL: could not reattach to shared memory (Win32)

----- Original Message ----
From: Magnus Hagander <magnus@hagander.net>
To: Shelby Cain <alyandon@yahoo.com>
Cc: Alvaro Herrera <alvherre@commandprompt.com>; Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
Sent: Friday, August 24, 2007 1:08:44 AM
Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

//Magnus

Assuming this is an issue with shared libraries, I think it would
have more to do with the way Windows resolves address conflicts on process
startup than anything caused by explicit calls to LoadLibrary(). Looking
at postgres.exe with the dependency viewer from
Visual Studio 6, I see that the following shared library dependencies
embedded in the executable image that having conflicting base
addresses. If I'm not mistaken, Windows will automatically relocate
these libraries prior to actual code execution so there would be no
opportunity for that particular instance of postgres.exe to map the shared memory if the address
space is already in use by a relocated dll.

libeay32.dll - 0x10000000

libiconv-2.dll - 0x10000000

libintl-2.dll - 0x10000000

ssleay32.dll - 0x10000000

comerr32.dll - 0x1c000000

krb5_32.dll - 0x1c000000

I also found a KB article that specifically addresses ERROR_INVALID_MEMORY being returned from MapViewOfFileEx().

http://support.microsoft.com/kb/125713

The article specifically addresses the concern where multiple processes
must use
the same address for mappings and how to accomplish that under
Windows. Search for "Addresses of Mapped Views". The only thing that
really gives me any pause is the fact the article hasn't been updated
past the NT 3.51/Windows 9x era but the underlying behavior might not have been changed in Windows 2000/XP/etc.

Regards,

Shelby Cain

____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

#20Shelby Cain
alyandon@yahoo.com
In reply to: Shelby Cain (#19)
Re: FATAL: could not reattach to shared memory (Win32)

I apologize for resending this but my editor in combination with Yahoo's web mail interface horribly mangled it...

----- Original Message ----
From: Magnus Hagander <magnus@hagander.net>
To: Shelby Cain <alyandon@yahoo.com>
Cc: Alvaro Herrera <alvherre@commandprompt.com>; Terry Yapt <yapt@technovell.com>; pgsql-general@postgresql.org
Sent: Friday, August 24, 2007 1:08:44 AM
Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

//Magnus

Assuming this is an issue with shared libraries, I think it would have more to do with the way Windows resolves address conflicts on process startup than anything caused by explicit calls to LoadLibrary(). Looking at postgres.exe with the dependency viewer from Visual Studio 6, I see that the following shared library dependencies embedded in the executable image that having conflicting base addresses. If I'm not mistaken, Windows will automatically relocate these libraries prior to actual code execution so there would be no opportunity for that particular instance of postgres.exe to map the shared memory if the address space is already in use by a relocated dll.

libeay32.dll - 0x10000000
libiconv-2.dll - 0x10000000
libintl-2.dll - 0x10000000
ssleay32.dll - 0x10000000
comerr32.dll - 0x1c000000
krb5_32.dll - 0x1c000000

I also found a KB article that addresses ERROR_INVALID_MEMORY being returned from MapViewOfFileEx().

http://support.microsoft.com/kb/125713

The article specifically addresses the concern where multiple processes must use the same address for mappings and how to accomplish that under Windows. Search for "Addresses of Mapped Views". The only thing that really gives me any pause is the fact the article hasn't been updated past the NT 3.51/Windows 9x era but the underlying behavior might not have been changed in Windows 2000/XP/etc.

Regards,

Shelby Cain

____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#18)
#22Terry Yapt
yapt@technovell.com
In reply to: Tom Lane (#15)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shelby Cain (#20)
#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
#25Trevor Talbot
quension@gmail.com
In reply to: Terry Yapt (#22)
#26Terry Yapt
yapt@technovell.com
In reply to: Trevor Talbot (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#10)
#28Terry Yapt
yapt@technovell.com
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#16)
#30Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#10)