"could not reattach to shared memory" on buildfarm member dory
So far, dory has failed three times with essentially identical symptoms:
2018-04-23 19:57:10.624 GMT [2240] FATAL: could not reattach to shared memory (key=0000000000000190, addr=00000000018E0000): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR: lost connection to parallel worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT: select count(*) from tenk1 group by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG: background worker "parallel worker" (PID 2240) exited with exit code 1
Now how can this be? We've successfully reserved and released the address
range we want to use, so it *should* be free at the instant we try to map.
Another thing that seems curious, though it may just be an artifact of
not having many data points yet, is that these failures all occurred
during pg_upgradecheck. You'd think the "check" and "install-check"
steps would be equally vulnerable to the failure.
I guess the good news is that we're seeing this in a reasonably
reproducible fashion, so there's some hope of digging down to find
out the actual cause.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
So far, dory has failed three times with essentially identical symptoms:
2018-04-23 19:57:10.624 GMT [2240] FATAL: could not reattach to shared memory (key=0000000000000190, addr=00000000018E0000): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR: lost connection to parallel worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT: select count(*) from tenk1 group by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG: background worker "parallel worker" (PID 2240) exited with exit code 1Now how can this be? We've successfully reserved and released the address
range we want to use, so it *should* be free at the instant we try to map.
Yeah, that's definitely interesting.
I guess the good news is that we're seeing this in a reasonably
reproducible fashion, so there's some hope of digging down to find
out the actual cause.
I've asked Heath to take a look at the system again and see if there's
any Windows logs or such that might help us understand what's happening.
AV was disabled on the box, so don't think it's that, at least.
Thanks!
Stephen
On Tue, Apr 24, 2018 at 11:18 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
So far, dory has failed three times with essentially identical symptoms:
2018-04-23 19:57:10.624 GMT [2240] FATAL: could not reattach to shared memory (key=0000000000000190, addr=00000000018E0000): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR: lost connection to parallel worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT: select count(*) from tenk1 group by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG: background worker "parallel worker" (PID 2240) exited with exit code 1Now how can this be? We've successfully reserved and released the address
range we want to use, so it *should* be free at the instant we try to map.Yeah, that's definitely interesting.
I wondered if another thread with the right timing could map something
between the VirtualFree() and MapViewOfFileEx() calls, but we don't
create the Windows signal handling thread until a bit later. Could
there be any any other threads active in the process?
Maybe try asking what's mapped there with VirtualQueryEx() on failure?
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Apr 24, 2018 at 11:37:33AM +1200, Thomas Munro wrote:
On Tue, Apr 24, 2018 at 11:18 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
So far, dory has failed three times with essentially identical symptoms:
2018-04-23 19:57:10.624 GMT [2240] FATAL: could not reattach to shared memory (key=0000000000000190, addr=00000000018E0000): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR: lost connection to parallel worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT: select count(*) from tenk1 group by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG: background worker "parallel worker" (PID 2240) exited with exit code 1Now how can this be? We've successfully reserved and released the address
range we want to use, so it *should* be free at the instant we try to map.Yeah, that's definitely interesting.
I wondered if another thread with the right timing could map something
between the VirtualFree() and MapViewOfFileEx() calls, but we don't
create the Windows signal handling thread until a bit later. Could
there be any any other threads active in the process?Maybe try asking what's mapped there with VirtualQueryEx() on failure?
+1. An implementation of that:
/messages/by-id/20170403065106.GA2624300@tornado.leadboat.com
On Tue, Apr 24, 2018 at 1:18 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
So far, dory has failed three times with essentially identical symptoms:
2018-04-23 19:57:10.624 GMT [2240] FATAL: could not reattach to shared
memory (key=0000000000000190, addr=00000000018E0000): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR: lost connection to parallel
worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT: select count(*) from
tenk1 group by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG: background worker "parallel
worker" (PID 2240) exited with exit code 1
Now how can this be? We've successfully reserved and released the
address
range we want to use, so it *should* be free at the instant we try to
map.
Yeah, that's definitely interesting.
I guess the good news is that we're seeing this in a reasonably
reproducible fashion, so there's some hope of digging down to find
out the actual cause.I've asked Heath to take a look at the system again and see if there's
any Windows logs or such that might help us understand what's happening.
AV was disabled on the box, so don't think it's that, at least.
Disabled or uninstalled?
Back when I was combating windows AV on a daily basis, this normally did
not have the same effect. Just disabling the AV didn't actually remove the
parts that caused issues, it just hid them. Actual uninstall is what was
required.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 24 April 2018 at 15:18, Magnus Hagander <magnus@hagander.net> wrote:
Back when I was combating windows AV on a daily basis, this normally did not
have the same effect. Just disabling the AV didn't actually remove the parts
that caused issues, it just hid them. Actual uninstall is what was required.
Yep. Specifically, they tended to inject kernel hooks and/or load hook
DLLs that did funky and often flakey things. Often with poor awareness
of things like multiple processes opening one file for write at the
same time.
I think I heard that MS has cleaned up the situation with AV
considerably by offering more kernel infrastructure for it, and
restricting what you can do in terms of kernel extensions etc. But I
don't know how much.
In any case, do you think dropping a minidump at the point of failure
might be informative? It should contain the full memory mapping
information. For this purpose we could just create a crashdumps/
directory then abort() when we detect the error, and have the
buildfarm stop processing until someone can grab the tempdir with the
dumps, binaries, .pdb files, etc.
src/backend/port/win32/crashdump.c doesn't expose a helper function to
do all the dbghelp.dll messing around and create a crashdump, it only
allows that to be done via a crash handler. But it might make sense to
break out the actual "write a crash dump" part to a separately
callable function. I've looked at doing this before, but always got
stuck with the apparent lack of support in gdb or lldb to be used as a
library for self-dumping. You can always shell out to gcore I guess...
but ew. Or we can fork() and abort() the forked child like
https://github.com/RuntimeTools/gencore does, but again, ew.
I was thinking that maybe the buildfarm could just create crashdumps/
automatically, but then we'd need to have support infrastructure for
recording the Pg binaries and .pdb files along with the dumps,
rotating them so we don't run out of space, etc etc.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
On Tue, Apr 24, 2018 at 1:18 AM, Stephen Frost <sfrost@snowman.net> wrote:
I've asked Heath to take a look at the system again and see if there's
any Windows logs or such that might help us understand what's happening.
AV was disabled on the box, so don't think it's that, at least.Disabled or uninstalled?
The only AV installed on the system is the "Windows Defender" thing, so
it's not some additional AV system.
Thanks!
Stephen
Noah Misch <noah@leadboat.com> writes:
On Tue, Apr 24, 2018 at 11:37:33AM +1200, Thomas Munro wrote:
Maybe try asking what's mapped there with VirtualQueryEx() on failure?
+1. An implementation of that:
/messages/by-id/20170403065106.GA2624300@tornado.leadboat.com
Not seeing any other work happening here, I pushed a little bit of
quick-hack investigation code. This is based on noting that
VirtualAllocEx is documented as rounding the allocation up to a page
boundary (4K), but there's nothing specific about whether or how much
CreateFileMapping or MapViewOfFileEx might round up. The observed
failures could be explained if those guys might eat more virtual
address space for the same request size as VirtualAllocEx does.
This is a stretch, for sure, but given the lack of any other theories
we might as well check it.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Noah Misch <noah@leadboat.com> writes:
On Tue, Apr 24, 2018 at 11:37:33AM +1200, Thomas Munro wrote:
Maybe try asking what's mapped there with VirtualQueryEx() on failure?
+1. An implementation of that:
/messages/by-id/20170403065106.GA2624300@tornado.leadboat.comNot seeing any other work happening here, I pushed a little bit of
quick-hack investigation code. This is based on noting that
VirtualAllocEx is documented as rounding the allocation up to a page
boundary (4K), but there's nothing specific about whether or how much
CreateFileMapping or MapViewOfFileEx might round up. The observed
failures could be explained if those guys might eat more virtual
address space for the same request size as VirtualAllocEx does.
This is a stretch, for sure, but given the lack of any other theories
we might as well check it.
Sounds good to me. Just as an FYI, there are a couple folks taking a
look at the system and trying to figure out what's going on. We've seen
an Event ID 1530 error in the Windows Event log associated with
vctip.exe which Visual Studio was running with the build, but only
sometimes. When vctip.exe is being run and then finishes, it goes and
cleans things up which seems to be what's triggering the 1530 and that
appears to correllate with the failures, but hard to say if that's
really a smoking gun or is just coincidence.
Thanks!
Stephen
[ Thanks to Stephen for cranking up a continuous build loop on dory ]
Noah Misch <noah@leadboat.com> writes:
On Tue, Apr 24, 2018 at 11:37:33AM +1200, Thomas Munro wrote:
Maybe try asking what's mapped there with VirtualQueryEx() on failure?
+1. An implementation of that:
/messages/by-id/20170403065106.GA2624300@tornado.leadboat.com
So I tried putting in that code, and it turns the problem from something
that maybe happens in every third buildfarm run or so, to something that
happens at least a dozen times in a single "make check" step. This seems
to mean that either EnumProcessModules or GetModuleFileNameEx is itself
allocating memory, and sometimes that allocation comes out of the space
VirtualFree just freed :-(.
So we can't use those functions. We have however proven that no new
module gets loaded during VirtualFree or MapViewOfFileEx, so there
doesn't seem to be anything more to be learned from them anyway.
What it looks like to me is that MapViewOfFileEx allocates some memory and
sometimes that comes out of the wrong place. This is, um, unfortunate.
It also appears that VirtualFree might sometimes allocate some memory,
and that'd be even more unfortunate, but it's hard to be certain; the
blame might well fail on VirtualQuery instead. (Ain't Heisenbugs fun?)
The solution I was thinking about last night was to have
PGSharedMemoryReAttach call MapViewOfFileEx to map the shared memory
segment at an unspecified address, then unmap it, then call VirtualFree,
and finally call MapViewOfFileEx with the real target address. The idea
here is to get these various DLLs to set up any memory allocation pools
they're going to set up before we risk doing VirtualFree. I am not,
at this point, convinced this will fix it :-( ... but I'm not sure what
else to try.
In any case, it's still pretty unclear why dory is showing this problem
and other buildfarm members are not. whelk for instance seems to be
loading all the same DLLs and more besides.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
[ Thanks to Stephen for cranking up a continuous build loop on dory ]
That was actually Heath, who is also trying to work this issue and has
an idea about something else which might help (and some more information
about what's happening in the event log). Adding him to the thread so
he can (easily) reply with what he's found.
Heath..?
Thanks!
Stephen
On Mon, Apr 30, 2018 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
[ Thanks to Stephen for cranking up a continuous build loop on dory ]
That was actually Heath, who is also trying to work this issue and has
an idea about something else which might help (and some more information
about what's happening in the event log). Adding him to the thread so
he can (easily) reply with what he's found.Heath..?
Thanks!
Stephen
So what I noticed after adding the '--force' flag was that in the Event
Viewer logs there was an Error in the System log stating that "The
application-specific permission settings do not grant Local Activation
permission for the COM Server application" for the Runtime Broker. So
around 2:00 pm today I went and changed the ownership of the registry
values to Administrators so I could add the user we are running the builds
under to the list of members that have access to it. However right after I
made that change the build was actually broken for me so I am just now
turning the builds back on to run constantly to verify if this has
any effect on the issue of not being able to reattach to the shared
memory. I am hoping that this makes things more stable, however I am not
confident that these are related.
The change that I attempted prior to this, which was done last Wednesday,
was to remove the vctip.exe program from being used as this was causing
issues for us as well. This was causing an Event ID 1530 error that stated
that "The application this is listed in the event details is leaving the
registry handle open" and Windows was helpfully closing any registry values
for that user profile, and I thought that possibly when doing so it was
cleaning up the memory that was allocated prior. However this did not
change anything for us after making that change.
Any ideas or changes that we could do to help debug or verify would be
helpful. We have considered changing it to run everything as SYSTEM but if
possible we would like to avoid this for security reasons. Thank you in
advance and I appreciate all the help.
-Heath
Greetings,
* Heath Lord (heath.lord@crunchydata.com) wrote:
Any ideas or changes that we could do to help debug or verify would be
helpful. We have considered changing it to run everything as SYSTEM but if
possible we would like to avoid this for security reasons. Thank you in
advance and I appreciate all the help.
Just to be clear- there is no longer anything showing up in the event
viewer associated with running the builds. There may still be an issue
with the system setup or such, but it at least seems less likely for
that to be the issue, so I'm thinking that Tom is more likely correct
that PG is doing something not quite right here.
Thanks!
Stephen
Heath Lord <heath.lord@crunchydata.com> writes:
So what I noticed after adding the '--force' flag was that in the Event
Viewer logs there was an Error in the System log stating that "The
application-specific permission settings do not grant Local Activation
permission for the COM Server application" for the Runtime Broker. So
around 2:00 pm today I went and changed the ownership of the registry
values to Administrators so I could add the user we are running the builds
under to the list of members that have access to it. However right after I
made that change the build was actually broken for me so I am just now
turning the builds back on to run constantly to verify if this has
any effect on the issue of not being able to reattach to the shared
memory. I am hoping that this makes things more stable, however I am not
confident that these are related.
The build was broken on Windows between about 12:30 and 14:30 EDT,
thanks to an unrelated change. Now that that's sorted, dory is
still failing :-(.
Moreover, even though I took out the module dump logic, the failure rate
is still a good bit higher than it was before, which seems to confirm my
fear that this is a Heisenbug: either VirtualQuery itself, or the act of
elog'ing a bunch of output, is causing memory allocations to take place
that otherwise would not have.
I'm hoping that the elog output is to blame for that, and am going to
go try to rejigger the code so that we capture the memory maps into space
that was allocated before VirtualFree.
Any ideas or changes that we could do to help debug or verify would be
helpful. We have considered changing it to run everything as SYSTEM but if
possible we would like to avoid this for security reasons.
Yeah, that's no solution. Even if it were OK for dory, end users wouldn't
necessarily want to run PG that way.
regards, tom lane
I wrote:
The solution I was thinking about last night was to have
PGSharedMemoryReAttach call MapViewOfFileEx to map the shared memory
segment at an unspecified address, then unmap it, then call VirtualFree,
and finally call MapViewOfFileEx with the real target address. The idea
here is to get these various DLLs to set up any memory allocation pools
they're going to set up before we risk doing VirtualFree. I am not,
at this point, convinced this will fix it :-( ... but I'm not sure what
else to try.
So the answer is that that doesn't help at all.
It's clear from dory's results that something is causing a 4MB chunk
of memory to get reserved in the process's address space, sometimes.
It might happen during the main MapViewOfFileEx call, or during the
preceding VirtualFree, or with my map/unmap dance in place, it might
happen during that. Frequently it doesn't happen at all, at least not
before the point where we've successfully done MapViewOfFileEx. But
if it does happen, and the chunk happens to get put in a spot that
overlaps where we want to put the shmem block, kaboom.
What seems like a plausible theory at this point is that the apparent
asynchronicity is due to the allocation being triggered by a different
thread, and the fact that our added monitoring code seems to make the
failure more likely can be explained by that code changing the timing.
But what thread could it be? It doesn't really look to me like either
the signal thread or the timer thread could eat 4MB. syslogger.c
also spawns a thread, on Windows, but AFAICS that's not being used in
this test configuration. Maybe the reason dory is showing the problem
is something or other is spawning a thread we don't even know about?
I'm going to go put a 1-sec sleep into the beginning of
PGSharedMemoryReAttach and see if that changes anything. If I'm right
that this is being triggered by another thread, that should allow the
other thread to do its thing (at least most of the time) so that the
failure rate ought to go way down.
Even if that does happen, I'm at a loss for a reasonable way to fix it
for real. Is there a way to seize control of a Windows process so that
there are no other running threads? Any other ideas?
regards, tom lane
Him
On 2018-04-30 20:01:40 -0400, Tom Lane wrote:
What seems like a plausible theory at this point is that the apparent
asynchronicity is due to the allocation being triggered by a different
thread, and the fact that our added monitoring code seems to make the
failure more likely can be explained by that code changing the timing.
But what thread could it be? It doesn't really look to me like either
the signal thread or the timer thread could eat 4MB.
It seems plausible that the underlying allocator allocates larger chunks
to serve small allocations. But we don't seem to have started any threads
at PGSharedMemoryReAttach() time? So it'd have to be something else that
starts threads.
Heath, could you use process explorer or such to check which processes
are running inside a working backend process?
Greetings,
Andres Freund
On Mon, Apr 30, 2018 at 08:01:40PM -0400, Tom Lane wrote:
It's clear from dory's results that something is causing a 4MB chunk
of memory to get reserved in the process's address space, sometimes.
It might happen during the main MapViewOfFileEx call, or during the
preceding VirtualFree, or with my map/unmap dance in place, it might
happen during that. Frequently it doesn't happen at all, at least not
before the point where we've successfully done MapViewOfFileEx. But
if it does happen, and the chunk happens to get put in a spot that
overlaps where we want to put the shmem block, kaboom.What seems like a plausible theory at this point is that the apparent
asynchronicity is due to the allocation being triggered by a different
thread, and the fact that our added monitoring code seems to make the
failure more likely can be explained by that code changing the timing.
But what thread could it be? It doesn't really look to me like either
the signal thread or the timer thread could eat 4MB. syslogger.c
also spawns a thread, on Windows, but AFAICS that's not being used in
this test configuration. Maybe the reason dory is showing the problem
is something or other is spawning a thread we don't even know about?
Likely some privileged daemon is creating a thread in every new process. (On
Windows, it's not unusual for one process to create a thread in another
process.) We don't have good control over that.
I'm at a loss for a reasonable way to fix it
for real. Is there a way to seize control of a Windows process so that
there are no other running threads?
I think not.
Any other ideas?
PostgreSQL could retry the whole process creation, analogous to
internal_forkexec() retries. Have the failed process exit after recording the
fact that it couldn't attach. Make the postmaster notice and spawn a
replacement. Give up after 100 failed attempts.
Noah Misch <noah@leadboat.com> writes:
On Mon, Apr 30, 2018 at 08:01:40PM -0400, Tom Lane wrote:
What seems like a plausible theory at this point is that the apparent
asynchronicity is due to the allocation being triggered by a different
thread, and the fact that our added monitoring code seems to make the
failure more likely can be explained by that code changing the timing.
But what thread could it be? It doesn't really look to me like either
the signal thread or the timer thread could eat 4MB. syslogger.c
also spawns a thread, on Windows, but AFAICS that's not being used in
this test configuration.
The 1-second wait doesn't seem to have changed things much, which puts
a hole in the idea that this is triggered by a thread spawned earlier
in backend process startup.
Maybe the reason dory is showing the problem
is something or other is spawning a thread we don't even know about?
Likely some privileged daemon is creating a thread in every new process.
Yeah, I'm afraid that's the most likely theory at this point; it offers
an explanation why we're seeing this on dory and not other machines.
Although if the daemon were responding to process startup, wouldn't the
extra wait have given it time to do so? There's still something that
doesn't add up here.
Any other ideas?
PostgreSQL could retry the whole process creation, analogous to
internal_forkexec() retries.
In the absence of any clearer theory about what's causing this,
that may be our only recourse. Sure is ugly though.
regards, tom lane
On Tue, May 1, 2018 at 2:59 PM, Noah Misch <noah@leadboat.com> wrote:
Likely some privileged daemon is creating a thread in every new process. (On
Windows, it's not unusual for one process to create a thread in another
process.) We don't have good control over that.
Huh. I was already amazed (as a non-Windows user) by the DSM code
that duplicates file handles into the postmaster process without its
cooperation, but starting threads is even more amazing.
Apparently debuggers do that. Could this be running in some kind of
debugger-managed environment or build, perhaps as a result of some
core dump capturing mode or something?
https://msdn.microsoft.com/en-us/library/windows/desktop/dd405484(v=vs.85).aspx
Apparently another way to mess with another process's memory map is
via "Asynchronous Procedure Calls":
http://blogs.microsoft.co.il/pavely/2017/03/14/injecting-a-dll-without-a-remote-thread/
It looks like that mechanism could allow something either in our own
process (perhaps some timer-related thing that we might have set up
ourselves or might be set up by the system?) or another process to
queue actions for our own thread to run at certain points.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
--
Thomas Munro
http://www.enterprisedb.com
Andres Freund <andres@anarazel.de> writes:
Heath, could you use process explorer or such to check which processes
are running inside a working backend process?
It seems to be possible to enumerate the threads that are present inside a
Windows process, although it's not clear to me how much identifying info
is available. Perhaps it'd be worth putting in some "dump threads"
debugging code like the "dump modules" code we had in there for a bit?
regards, tom lane
Well, at this point the only thing that's entirely clear is that none
of the ideas I had work. I think we are going to be forced to pursue
Noah's idea of doing an end-to-end retry. Somebody else will need to
take point on that; I lack a Windows environment and have already done
a lot more blind patch-pushing than I like in this effort.
I'll revert the debugging code I added to win32_shmem.c, unless
someone sees a reason to leave it there awhile longer.
regards, tom lane
On Tue, May 01, 2018 at 11:31:50AM -0400, Tom Lane wrote:
Well, at this point the only thing that's entirely clear is that none
of the ideas I had work. I think we are going to be forced to pursue
Noah's idea of doing an end-to-end retry. Somebody else will need to
take point on that; I lack a Windows environment and have already done
a lot more blind patch-pushing than I like in this effort.
Having tried this, I find a choice between performance and complexity. Both
of my designs use proc_exit(4) to indicate failure to reattach. The simpler,
slower design has WIN32 internal_forkexec() block until the child reports (via
SetEvent()) that it reattached to shared memory. This caused a fivefold
reduction in process creation performance[1]This (2 forks per transaction) dropped from 139tps to 27tps: echo 'select 1' >script env PGOPTIONS="--default_transaction_isolation=repeatable\\ read --force_parallel_mode=on" pgbench -T15 -j30 -c30 --connect -n -fscript. The less-simple, faster design
stashes the Port structure and retry count in the BackendList entry, which
reaper() uses to retry the fork upon seeing status 4. Notably, this requires
new code for regular backends, for bgworkers, and for others. It's currently
showing a 30% performance _increase_ on the same benchmark; I can't explain
that increase and doubt it will last, but I think it's plausible for the
less-simple design to be performance-neutral.
I see these options:
1. Use the simpler design with a GUC, disabled by default, to control whether
the new code is active. Mention the GUC in a new errhint() for the "could
not reattach to shared memory" error.
2. Like (1), but enable the GUC by default.
3. Like (1), but follow up with a patch to enable the GUC by default in v12
only.
4. In addition to (1), enable retries if the GUC is set _or_ this postmaster
has seen at least one child fail to reattach.
5. Use the less-simple design, with retries enabled unconditionally.
I think I prefer (3), with (1) being a close second. My hesitation on (3) is
that parallel query has made startup time count even if you use a connection
pool, and all the Windows users not needing these retries will see parallel
query become that much slower. I dislike (5) for its impact on
platform-independent postmaster code. Other opinions?
I'm attaching a mostly-finished patch for the slower design. I tested
correctness with -DREATTACH_FORCE_FAIL_PERCENT=99. I'm also attaching a
proof-of-concept patch for the faster design. In this proof of concept, the
postmaster does not close its copy of a backend socket until the backend
exits. Also, bgworkers can change from BGWH_STARTED back to
BGWH_NOT_YET_STARTED; core code tolerates this, but external code may not.
Those would justify paying some performance to fix. The proof of concept
handles bgworkers and regular backends, but it does not handle the startup
process, checkpointer, etc. That doesn't affect benchmarking, of course.
nm
[1]: This (2 forks per transaction) dropped from 139tps to 27tps: echo 'select 1' >script env PGOPTIONS="--default_transaction_isolation=repeatable\\ read --force_parallel_mode=on" pgbench -T15 -j30 -c30 --connect -n -fscript
echo 'select 1' >script
env PGOPTIONS="--default_transaction_isolation=repeatable\\ read --force_parallel_mode=on" pgbench -T15 -j30 -c30 --connect -n -fscript
Attachments:
w32attach-retry-block-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index f8ca52e..e96517f 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -383,6 +383,20 @@ PGSharedMemoryReAttach(void)
Assert(UsedShmemSegAddr != NULL);
Assert(IsUnderPostmaster);
+#ifdef REATTACH_FORCE_FAIL_PERCENT
+
+ /*
+ * For testing, emulate a system where my MapViewOfFileEx() fails some
+ * percentage of the time.
+ */
+ srandom((unsigned int) (MyProcPid ^ MyStartTime));
+ if (random() % 100 < (REATTACH_FORCE_FAIL_PERCENT))
+ {
+ elog(LOG, "emulating failure to reattach to shared memory");
+ proc_exit(4);
+ }
+#endif
+
/*
* Release memory region reservation that was made by the postmaster
*/
@@ -392,8 +406,11 @@ PGSharedMemoryReAttach(void)
hdr = (PGShmemHeader *) MapViewOfFileEx(UsedShmemSegID, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, 0, UsedShmemSegAddr);
if (!hdr)
- elog(FATAL, "could not reattach to shared memory (key=%p, addr=%p): error code %lu",
+ {
+ elog(LOG, "could not reattach to shared memory (key=%p, addr=%p): error code %lu",
UsedShmemSegID, UsedShmemSegAddr, GetLastError());
+ proc_exit(4); /* Ask internal_forkexec() to retry. */
+ }
if (hdr != origUsedShmemSegAddr)
elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)",
hdr, origUsedShmemSegAddr);
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b3..7432223 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -454,6 +454,12 @@ static void InitPostmasterDeathWatchHandle(void);
static pid_t waitpid(pid_t pid, int *exitstatus, int options);
static void WINAPI pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired);
+/*
+ * During internal_forkexec(), the child signals this auto-reset event to
+ * indicate that it is no longer at risk of needing a fork retry.
+ */
+static HANDLE win32ReAttach;
+
static HANDLE win32ChildQueue;
typedef struct
@@ -520,6 +526,7 @@ typedef struct
int max_safe_fds;
int MaxBackends;
#ifdef WIN32
+ HANDLE win32ReAttach;
HANDLE PostmasterHandle;
HANDLE initial_signal_pipe;
HANDLE syslogPipe[2];
@@ -556,6 +563,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
#define EXIT_STATUS_0(st) ((st) == 0)
#define EXIT_STATUS_1(st) (WIFEXITED(st) && WEXITSTATUS(st) == 1)
#define EXIT_STATUS_3(st) (WIFEXITED(st) && WEXITSTATUS(st) == 3)
+/* status 4 means PGSharedMemoryReAttach() failure; see internal_forkexec() */
#ifndef WIN32
/*
@@ -1197,6 +1205,20 @@ PostmasterMain(int argc, char *argv[])
#ifdef WIN32
+ {
+ SECURITY_ATTRIBUTES sa;
+
+ sa.nLength = sizeof(sa);
+ sa.lpSecurityDescriptor = NULL;
+ sa.bInheritHandle = TRUE;
+
+ win32ReAttach = CreateEvent(&sa, FALSE, FALSE, NULL);
+ if (win32ReAttach == NULL)
+ ereport(FATAL,
+ (errmsg("could not create event to facilitate fork emulation: error code %lu",
+ GetLastError())));
+ }
+
/*
* Initialize I/O completion port used to deliver list of dead children.
*/
@@ -4524,6 +4546,8 @@ internal_forkexec(int argc, char *argv[], Port *port)
BackendParameters *param;
SECURITY_ATTRIBUTES sa;
char paramHandleStr[32];
+ HANDLE wait_handles[2];
+ bool done;
win32_deadchild_waitinfo *childinfo;
/* Make sure caller set up argv properly */
@@ -4650,8 +4674,7 @@ retry:
/*
* Now that the backend variables are written out, we start the child
- * thread so it can start initializing while we set up the rest of the
- * parent state.
+ * thread so it can start initializing.
*/
if (ResumeThread(pi.hThread) == -1)
{
@@ -4673,6 +4696,64 @@ retry:
}
/*
+ * Block until child dies or uses SetEvent(win32ReAttach) to indicate that
+ * it reattached to shared memory.
+ */
+ wait_handles[0] = win32ReAttach;
+ wait_handles[1] = pi.hProcess;
+ done = false;
+ while (!done)
+ {
+ DWORD rc,
+ exitcode;
+
+ rc = WaitForMultipleObjectsEx(2, wait_handles, FALSE, INFINITE, TRUE);
+ switch (rc)
+ {
+ case WAIT_OBJECT_0: /* child reattached */
+ done = true;
+ break;
+ case WAIT_OBJECT_0 + 1: /* child exited */
+ done = true;
+ if (GetExitCodeProcess(pi.hProcess, &exitcode) &&
+ exitcode == 4)
+ {
+ /* child already made a log entry */
+ CloseHandle(pi.hProcess);
+ CloseHandle(pi.hThread);
+ if (++retry_count < 100)
+ goto retry;
+ ereport(LOG,
+ (errmsg("giving up after too many tries to reattach to shared memory"),
+ errhint("This might be caused by ASLR or antivirus software.")));
+ return -1;
+ }
+ /* else, let pgwin32_deadchild_callback() handle the exit */
+ break;
+ case WAIT_IO_COMPLETION:
+
+ /*
+ * The system interrupted the wait to execute an I/O
+ * completion routine or asynchronous procedure call in this
+ * thread. PostgreSQL does not provoke either of these, but
+ * atypical loaded DLLs or even other processes might do so.
+ * Now, resume waiting.
+ */
+ break;
+ case WAIT_FAILED:
+ ereport(FATAL,
+ (errmsg("could not wait for new child to start: error code %lu",
+ GetLastError())));
+ break;
+ default:
+ elog(FATAL,
+ "unexpected return code from WaitForMultipleObjectsEx(): %lu",
+ rc);
+ break;
+ }
+ }
+
+ /*
* Queue a waiter for to signal when this child dies. The wait will be
* handled automatically by an operating system thread pool.
*
@@ -4787,6 +4868,14 @@ SubPostmasterMain(int argc, char *argv[])
else
PGSharedMemoryNoReAttach();
+#ifdef WIN32
+ if (!SetEvent(win32ReAttach))
+ elog(FATAL,
+ "could not indicate, to postmaster, reattach to shared memory: error code %lu",
+ GetLastError());
+ /* postmaster is now free to return from internal_forkexec() */
+#endif
+
/* autovacuum needs this set before calling InitProcess */
if (strcmp(argv[1], "--forkavlauncher") == 0)
AutovacuumLauncherIAm();
@@ -6030,6 +6119,7 @@ save_backend_variables(BackendParameters *param, Port *port,
param->MaxBackends = MaxBackends;
#ifdef WIN32
+ param->win32ReAttach = win32ReAttach;
param->PostmasterHandle = PostmasterHandle;
if (!write_duplicated_handle(¶m->initial_signal_pipe,
pgwin32_create_signal_listener(childPid),
@@ -6262,6 +6352,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
MaxBackends = param->MaxBackends;
#ifdef WIN32
+ win32ReAttach = param->win32ReAttach;
PostmasterHandle = param->PostmasterHandle;
pgwin32_initial_signal_pipe = param->initial_signal_pipe;
#else
w32attach-retry-nonblock-v0.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index f8ca52e..e84e2fc 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -382,6 +382,7 @@ PGSharedMemoryReAttach(void)
Assert(UsedShmemSegAddr != NULL);
Assert(IsUnderPostmaster);
+ /* FIXME change FATAL to proc_exit(1) */
/*
* Release memory region reservation that was made by the postmaster
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b3..eadf7f1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -180,6 +180,14 @@ typedef struct bkend
bool dead_end; /* is it going to send an error and quit? */
bool bgworker_notify; /* gets bgworker start/stop notifications */
dlist_node elem; /* list link in BackendList */
+
+#ifdef EXEC_BACKEND
+#define RETRYABLE_FORK
+#endif
+#ifdef RETRYABLE_FORK
+ Port *port; /* for still-opening backends */
+ int retry_count;
+#endif
} Backend;
static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
@@ -412,6 +420,7 @@ static void BackendInitialize(Port *port);
static void BackendRun(Port *port) pg_attribute_noreturn();
static void ExitPostmaster(int status) pg_attribute_noreturn();
static int ServerLoop(void);
+static pid_t BackendFork(Backend *bn);
static int BackendStartup(Port *port);
static int ProcessStartupPacket(Port *port, bool SSLdone);
static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
@@ -427,6 +436,7 @@ static void TerminateChildren(int signal);
#define SignalChildren(sig) SignalSomeChildren(sig, BACKEND_TYPE_ALL)
static int CountChildren(int target);
+static pid_t do_fork_bgworker(RegisteredBgWorker *rw);
static bool assign_backendlist_entry(RegisteredBgWorker *rw);
static void maybe_start_bgworkers(void);
static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
@@ -543,6 +553,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
#endif
static void ShmemBackendArrayAdd(Backend *bn);
+static void ShmemBackendArrayUpdate(Backend *bn, pid_t oldpid);
static void ShmemBackendArrayRemove(Backend *bn);
#endif /* EXEC_BACKEND */
@@ -556,6 +567,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
#define EXIT_STATUS_0(st) ((st) == 0)
#define EXIT_STATUS_1(st) (WIFEXITED(st) && WEXITSTATUS(st) == 1)
#define EXIT_STATUS_3(st) (WIFEXITED(st) && WEXITSTATUS(st) == 3)
+#define EXIT_STATUS_4(st) (WIFEXITED(st) && WEXITSTATUS(st) == 4)
#ifndef WIN32
/*
@@ -1703,14 +1715,8 @@ ServerLoop(void)
port = ConnCreate(ListenSocket[i]);
if (port)
{
+ /* FIXME refactor more to happen in BackendStartup() */
BackendStartup(port);
-
- /*
- * We no longer need the open socket or port structure
- * in this process
- */
- StreamClose(port->sock);
- ConnFree(port);
}
}
}
@@ -2311,7 +2317,11 @@ processCancelRequest(Port *port, void *pkt)
backendPID)));
return;
}
+#ifndef EXEC_BACKEND
+ }
+#else
}
+#endif
/* No matching backend */
ereport(LOG,
@@ -2393,8 +2403,6 @@ ConnCreate(int serverFd)
if (StreamConnection(serverFd, port) != STATUS_OK)
{
- if (port->sock != PGINVALID_SOCKET)
- StreamClose(port->sock);
ConnFree(port);
return NULL;
}
@@ -2425,6 +2433,8 @@ ConnCreate(int serverFd)
static void
ConnFree(Port *conn)
{
+ if (conn->sock != PGINVALID_SOCKET)
+ StreamClose(conn->sock);
#ifdef USE_SSL
secure_close(conn);
#endif
@@ -2814,6 +2824,8 @@ reaper(SIGNAL_ARGS)
continue;
}
+ /* FIXME restart startup process after status 4 */
+
/*
* Unexpected exit of startup process (including FATAL exit)
* during PM_STARTUP is treated as catastrophic. There are no
@@ -2897,14 +2909,15 @@ reaper(SIGNAL_ARGS)
}
/*
- * Was it the bgwriter? Normal exit can be ignored; we'll start a new
- * one at the next iteration of the postmaster's main loop, if
- * necessary. Any other exit condition is treated as a crash.
+ * Was it the bgwriter? Normal exit and retryable startup failure can
+ * be ignored; we'll start a new one at the next iteration of the
+ * postmaster's main loop, if necessary. Any other exit condition is
+ * treated as a crash.
*/
if (pid == BgWriterPID)
{
BgWriterPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_4(exitstatus))
HandleChildCrash(pid, exitstatus,
_("background writer process"));
continue;
@@ -2953,6 +2966,7 @@ reaper(SIGNAL_ARGS)
if (PgStatPID != 0)
signal_child(PgStatPID, SIGQUIT);
}
+ /* FIXME retry checkpointer */
else
{
/*
@@ -2967,44 +2981,47 @@ reaper(SIGNAL_ARGS)
}
/*
- * Was it the wal writer? Normal exit can be ignored; we'll start a
- * new one at the next iteration of the postmaster's main loop, if
- * necessary. Any other exit condition is treated as a crash.
+ * Was it the wal writer? Normal exit and retryable startup failure
+ * can be ignored; we'll start a new one at the next iteration of the
+ * postmaster's main loop, if necessary. Any other exit condition is
+ * treated as a crash.
*/
if (pid == WalWriterPID)
{
WalWriterPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_4(exitstatus))
HandleChildCrash(pid, exitstatus,
_("WAL writer process"));
continue;
}
/*
- * Was it the wal receiver? If exit status is zero (normal) or one
- * (FATAL exit), we assume everything is all right just like normal
- * backends. (If we need a new wal receiver, we'll start one at the
- * next iteration of the postmaster's main loop.)
+ * Was it the wal receiver? If exit status is zero (normal), one
+ * (FATAL exit) or four (retryable startup failure), we assume
+ * everything is all right just like normal backends. (If we need a
+ * new wal receiver, we'll start one at the next iteration of the
+ * postmaster's main loop.)
*/
if (pid == WalReceiverPID)
{
WalReceiverPID = 0;
- if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus) &&
+ !EXIT_STATUS_4(exitstatus))
HandleChildCrash(pid, exitstatus,
_("WAL receiver process"));
continue;
}
/*
- * Was it the autovacuum launcher? Normal exit can be ignored; we'll
- * start a new one at the next iteration of the postmaster's main
- * loop, if necessary. Any other exit condition is treated as a
- * crash.
+ * Was it the autovacuum launcher? Normal exit and retryable startup
+ * failure can be ignored; we'll start a new one at the next iteration
+ * of the postmaster's main loop, if necessary. Any other exit
+ * condition is treated as a crash.
*/
if (pid == AutoVacPID)
{
AutoVacPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_4(exitstatus))
HandleChildCrash(pid, exitstatus,
_("autovacuum launcher process"));
continue;
@@ -3116,33 +3133,60 @@ CleanupBackgroundWorker(int pid,
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""),
rw->rw_worker.bgw_type);
+ LogChildExit(DEBUG2, namebuf, pid, exitstatus);
- if (!EXIT_STATUS_0(exitstatus))
- {
- /* Record timestamp, so we know when to restart the worker. */
- rw->rw_crashed_at = GetCurrentTimestamp();
- }
- else
+ if (EXIT_STATUS_0(exitstatus))
{
/* Zero exit status means terminate */
rw->rw_crashed_at = 0;
rw->rw_terminate = true;
}
+ else if (!EXIT_STATUS_4(exitstatus))
+ {
+ /* Record timestamp, so we know when to restart the worker. */
+ rw->rw_crashed_at = GetCurrentTimestamp();
+ }
/*
* Additionally, for shared-memory-connected workers, just like a
- * backend, any exit status other than 0 or 1 is considered a crash
+ * backend, any exit status other than 0, 1 or 4 is considered a crash
* and causes a system-wide restart.
*/
if ((rw->rw_worker.bgw_flags & BGWORKER_SHMEM_ACCESS) != 0)
{
- if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus) &&
+ !EXIT_STATUS_4(exitstatus))
{
HandleChildCrash(pid, exitstatus, namebuf);
return true;
}
}
+#ifdef RETRYABLE_FORK
+ if (EXIT_STATUS_4(exitstatus))
+ {
+ Backend *bp = rw->rw_backend;
+
+ if (++bp->retry_count >= 100) /* FIXME use a constant */
+ {
+ ereport(LOG,
+ (errmsg("giving up after too many tries to reserve shared memory"),
+ errhint("This might be caused by ASLR or antivirus software.")));
+ }
+ else
+ {
+ bp->pid = do_fork_bgworker(rw);
+ if (bp->pid > 0)
+ {
+ ShmemBackendArrayUpdate(bp, pid);
+ return true;
+ }
+ }
+ /* mark entry as crashed, so we'll try again later */
+ rw->rw_crashed_at = GetCurrentTimestamp();
+ }
+#endif
+
/*
* We must release the postmaster child slot whether this worker is
* connected to shared memory or not, but we only treat it as a crash
@@ -3187,7 +3231,8 @@ CleanupBackgroundWorker(int pid,
/*
* CleanupBackend -- cleanup after terminated backend.
*
- * Remove all local state associated with backend.
+ * If the backend failed during early startup, retry forking it. Otherwise,
+ * remove all local state associated with backend.
*
* If you change this, see also CleanupBackgroundWorker.
*/
@@ -3222,7 +3267,8 @@ CleanupBackend(int pid,
}
#endif
- if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus) &&
+ !EXIT_STATUS_4(exitstatus))
{
HandleChildCrash(pid, exitstatus, _("server process"));
return;
@@ -3234,6 +3280,40 @@ CleanupBackend(int pid,
if (bp->pid == pid)
{
+#ifdef RETRYABLE_FORK
+ if (EXIT_STATUS_4(exitstatus))
+ {
+ Assert(bp->port);
+
+ if (++bp->retry_count >= 100)
+ {
+ ereport(LOG,
+ (errmsg("giving up after too many tries to reserve shared memory"),
+ errhint("This might be caused by ASLR or antivirus software.")));
+ report_fork_failure_to_client(bp->port, EAGAIN);
+ }
+ else
+ {
+ bp->pid = BackendFork(bp);
+ if (bp->pid > 0)
+ {
+ if (!bp->dead_end)
+ ShmemBackendArrayUpdate(bp, pid);
+ return;
+ }
+ /* Else, fork failed. Fall through to remove local state. */
+ }
+ }
+ /*
+ * For a sufficiently long-running backend, FIXME will have
+ * already called ConnFree().
+ */
+ if (bp->port)
+ {
+ ConnFree(bp->port);
+ bp->port = NULL;
+ }
+#endif
if (!bp->dead_end)
{
if (!ReleasePostmasterChildSlot(bp->child_slot))
@@ -3607,6 +3687,14 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
static void
PostmasterStateMachine(void)
{
+ if (pmState == PM_STARTUP && StartupStatus == STARTUP_NOT_RUNNING)
+ {
+ /* We need a startup process, either the */
+ StartupPID = StartupDataBase();
+ Assert(StartupPID != 0);
+ StartupStatus = STARTUP_RUNNING;
+ }
+
if (pmState == PM_WAIT_BACKUP)
{
/*
@@ -3954,9 +4042,60 @@ TerminateChildren(int signal)
signal_child(PgStatPID, signal);
}
+/* FIXME comment */
+static pid_t
+BackendFork(Backend *bn)
+{
+ pid_t pid;
+
+ MyCancelKey = bn->cancel_key;
+ MyPMChildSlot = bn->child_slot;
+
+#ifdef EXEC_BACKEND
+ pid = backend_forkexec(bn->port);
+#else /* !EXEC_BACKEND */
+ pid = fork_process();
+ if (pid == 0) /* child */
+ {
+ /* Detangle from postmaster */
+ InitPostmasterChild();
+
+ /* Close the postmaster's sockets */
+ ClosePostmasterPorts(false);
+
+ /* Perform additional initialization and collect startup packet */
+ BackendInitialize(bn->port);
+
+ /* And run the backend */
+ BackendRun(bn->port);
+ }
+#endif /* EXEC_BACKEND */
+
+ if (pid < 0)
+ {
+ /* in parent, fork failed */
+ int save_errno = errno;
+
+ errno = save_errno;
+ ereport(LOG,
+ (errmsg("could not fork new process for connection: %m")));
+ report_fork_failure_to_client(bn->port, save_errno);
+ }
+ else
+ {
+ /* in parent, successful fork */
+ ereport(DEBUG2,
+ (errmsg_internal("forked new backend, pid=%d socket=%d",
+ (int) pid, (int) bn->port->sock)));
+ }
+
+ return pid;
+}
+
/*
* BackendStartup -- start backend process
*
+ * FIXME
* returns: STATUS_ERROR if the fork failed, STATUS_OK otherwise.
*
* Note: if you change this code, also consider StartAutovacuumWorker.
@@ -3965,7 +4104,6 @@ static int
BackendStartup(Port *port)
{
Backend *bn; /* for backend cleanup */
- pid_t pid;
/*
* Create backend data structure. Better before the fork() so we can
@@ -3977,24 +4115,13 @@ BackendStartup(Port *port)
ereport(LOG,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory")));
- return STATUS_ERROR;
- }
-
- /*
- * Compute the cancel key that will be assigned to this backend. The
- * backend will have its own copy in the forked-off process' value of
- * MyCancelKey, so that it can transmit the key to the frontend.
- */
- if (!RandomCancelKey(&MyCancelKey))
- {
- free(bn);
- ereport(LOG,
- (errcode(ERRCODE_INTERNAL_ERROR),
- errmsg("could not generate random cancel key")));
- return STATUS_ERROR;
+ goto err;
}
- bn->cancel_key = MyCancelKey;
+#ifdef RETRYABLE_FORK
+ bn->port = port;
+ bn->retry_count = 0;
+#endif
/* Pass down canAcceptConnections state */
port->canAcceptConnections = canAcceptConnections();
@@ -4005,60 +4132,38 @@ BackendStartup(Port *port)
* Unless it's a dead_end child, assign it a child slot number
*/
if (!bn->dead_end)
- bn->child_slot = MyPMChildSlot = AssignPostmasterChildSlot();
+ bn->child_slot = AssignPostmasterChildSlot();
else
bn->child_slot = 0;
/* Hasn't asked to be notified about any bgworkers yet */
bn->bgworker_notify = false;
-#ifdef EXEC_BACKEND
- pid = backend_forkexec(port);
-#else /* !EXEC_BACKEND */
- pid = fork_process();
- if (pid == 0) /* child */
+ /* Compute the cancel key that will be assigned to this backend. */
+ if (!RandomCancelKey(&bn->cancel_key))
{
- free(bn);
-
- /* Detangle from postmaster */
- InitPostmasterChild();
-
- /* Close the postmaster's sockets */
- ClosePostmasterPorts(false);
-
- /* Perform additional initialization and collect startup packet */
- BackendInitialize(port);
-
- /* And run the backend */
- BackendRun(port);
- }
-#endif /* EXEC_BACKEND */
-
- if (pid < 0)
- {
- /* in parent, fork failed */
- int save_errno = errno;
-
- if (!bn->dead_end)
- (void) ReleasePostmasterChildSlot(bn->child_slot);
- free(bn);
- errno = save_errno;
ereport(LOG,
- (errmsg("could not fork new process for connection: %m")));
- report_fork_failure_to_client(port, save_errno);
- return STATUS_ERROR;
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("could not generate random cancel key")));
+ goto err;
}
- /* in parent, successful fork */
- ereport(DEBUG2,
- (errmsg_internal("forked new backend, pid=%d socket=%d",
- (int) pid, (int) port->sock)));
+ bn->pid = BackendFork(bn);
+ if (bn->pid < 0)
+ goto err;
+
+#ifndef RETRYABLE_FORK
+ /*
+ * We no longer need the open socket or port structure
+ * in this process
+ */
+ ConnFree(port);
+#endif
/*
* Everything's been successful, it's safe to add this backend to our list
* of backends.
*/
- bn->pid = pid;
bn->bkend_type = BACKEND_TYPE_NORMAL; /* Can change later to WALSND */
dlist_push_head(&BackendList, &bn->elem);
@@ -4068,6 +4173,16 @@ BackendStartup(Port *port)
#endif
return STATUS_OK;
+
+err:
+ ConnFree(port);
+ if (bn)
+ {
+ if (bn->child_slot)
+ (void) ReleasePostmasterChildSlot(bn->child_slot);
+ free(bn);
+ }
+ return STATUS_ERROR;
}
/*
@@ -4708,6 +4823,15 @@ retry:
#endif /* WIN32 */
+static void
+MarkSharedStateComplete(void)
+{
+#ifdef FIXME_WIN32
+ SendPostmasterSignal(PMSIGNAL_BACKEND_SHMAT);
+#endif
+}
+
+
/*
* SubPostmasterMain -- Get the fork/exec'd process into a state equivalent
* to what it would be if we'd simply forked on Unix, and then
@@ -4783,7 +4907,24 @@ SubPostmasterMain(int argc, char *argv[])
strcmp(argv[1], "--forkavworker") == 0 ||
strcmp(argv[1], "--forkboot") == 0 ||
strncmp(argv[1], "--forkbgworker=", 15) == 0)
+ {
+ /* test retryable starts */
+ if (0 && (strcmp(argv[1], "--forkbackend") == 0 ||
+ strcmp(argv[1], "--forkavlauncher") == 0 ||
+ strcmp(argv[1], "--forkavworker") == 0 ||
+ /* strcmp(argv[1], "--forkboot") == 0 || */
+ strncmp(argv[1], "--forkbgworker=", 15) == 0))
+ {
+ MyProcPid = getpid(); /* FIXME not needed? */
+ srandom((unsigned int ) (MyProcPid ^ MyStartTime));
+ if (random() % 10 != 0)
+ {
+ elog(LOG, "failing startup for testing purposes");
+ proc_exit(4);
+ }
+ }
PGSharedMemoryReAttach();
+ }
else
PGSharedMemoryNoReAttach();
@@ -4878,6 +5019,7 @@ SubPostmasterMain(int argc, char *argv[])
/* Attach process to shared data structures */
CreateSharedMemoryAndSemaphores(false, 0);
+ MarkSharedStateComplete();
/* And run the backend */
BackendRun(&port); /* does not return */
@@ -4892,6 +5034,7 @@ SubPostmasterMain(int argc, char *argv[])
/* Attach process to shared data structures */
CreateSharedMemoryAndSemaphores(false, 0);
+ MarkSharedStateComplete();
AuxiliaryProcessMain(argc - 2, argv + 2); /* does not return */
}
@@ -4905,6 +5048,7 @@ SubPostmasterMain(int argc, char *argv[])
/* Attach process to shared data structures */
CreateSharedMemoryAndSemaphores(false, 0);
+ MarkSharedStateComplete();
AutoVacLauncherMain(argc - 2, argv + 2); /* does not return */
}
@@ -4918,6 +5062,7 @@ SubPostmasterMain(int argc, char *argv[])
/* Attach process to shared data structures */
CreateSharedMemoryAndSemaphores(false, 0);
+ MarkSharedStateComplete();
AutoVacWorkerMain(argc - 2, argv + 2); /* does not return */
}
@@ -4936,6 +5081,7 @@ SubPostmasterMain(int argc, char *argv[])
/* Attach process to shared data structures */
CreateSharedMemoryAndSemaphores(false, 0);
+ MarkSharedStateComplete();
/* Fetch MyBgworkerEntry from shared memory */
shmem_slot = atoi(argv[1] + 15);
@@ -4952,12 +5098,14 @@ SubPostmasterMain(int argc, char *argv[])
if (strcmp(argv[1], "--forkcol") == 0)
{
/* Do not want to attach to shared memory */
+ MarkSharedStateComplete();
PgstatCollectorMain(argc, argv); /* does not return */
}
if (strcmp(argv[1], "--forklog") == 0)
{
/* Do not want to attach to shared memory */
+ MarkSharedStateComplete();
SysLoggerMain(argc, argv); /* does not return */
}
@@ -5620,40 +5768,14 @@ bgworker_forkexec(int shmem_slot)
}
#endif
-/*
- * Start a new bgworker.
- * Starting time conditions must have been checked already.
- *
- * Returns true on success, false on failure.
- * In either case, update the RegisteredBgWorker's state appropriately.
- *
- * This code is heavily based on autovacuum.c, q.v.
- */
-static bool
-do_start_bgworker(RegisteredBgWorker *rw)
+/* FIXME comment */
+static pid_t
+do_fork_bgworker(RegisteredBgWorker *rw)
{
pid_t worker_pid;
- Assert(rw->rw_pid == 0);
-
- /*
- * Allocate and assign the Backend element. Note we must do this before
- * forking, so that we can handle out of memory properly.
- *
- * Treat failure as though the worker had crashed. That way, the
- * postmaster will wait a bit before attempting to start it again; if it
- * tried again right away, most likely it'd find itself repeating the
- * out-of-memory or fork failure condition.
- */
- if (!assign_backendlist_entry(rw))
- {
- rw->rw_crashed_at = GetCurrentTimestamp();
- return false;
- }
-
- ereport(DEBUG1,
- (errmsg("starting background worker process \"%s\"",
- rw->rw_worker.bgw_name)));
+ MyCancelKey = rw->rw_backend->cancel_key;
+ MyPMChildSlot = rw->rw_backend->child_slot;
#ifdef EXEC_BACKEND
switch ((worker_pid = bgworker_forkexec(rw->rw_shmem_slot)))
@@ -5665,13 +5787,6 @@ do_start_bgworker(RegisteredBgWorker *rw)
/* in postmaster, fork failed ... */
ereport(LOG,
(errmsg("could not fork worker process: %m")));
- /* undo what assign_backendlist_entry did */
- ReleasePostmasterChildSlot(rw->rw_child_slot);
- rw->rw_child_slot = 0;
- free(rw->rw_backend);
- rw->rw_backend = NULL;
- /* mark entry as crashed, so we'll try again later */
- rw->rw_crashed_at = GetCurrentTimestamp();
break;
#ifndef EXEC_BACKEND
@@ -5702,15 +5817,73 @@ do_start_bgworker(RegisteredBgWorker *rw)
#endif
default:
/* in postmaster, fork successful ... */
- rw->rw_pid = worker_pid;
+ rw->rw_pid = worker_pid; /* FIXME move later */
rw->rw_backend->pid = rw->rw_pid;
- ReportBackgroundWorkerPID(rw);
- /* add new worker to lists of backends */
- dlist_push_head(&BackendList, &rw->rw_backend->elem);
+ ReportBackgroundWorkerPID(rw); /* FIXME move later */
+
+ ereport(DEBUG2,
+ (errmsg_internal("FIXME forked new bgworker, pid=%d",
+ (int) worker_pid)));
+ }
+
+ return worker_pid;
+}
+
+/*
+ * Start a new bgworker.
+ * Starting time conditions must have been checked already.
+ *
+ * Returns true on success, false on failure.
+ * In either case, update the RegisteredBgWorker's state appropriately.
+ * FIXME responsible for BackendList lifecycle
+ *
+ * This code is heavily based on autovacuum.c, q.v.
+ */
+static bool
+do_start_bgworker(RegisteredBgWorker *rw)
+{
+ pid_t worker_pid;
+
+ Assert(rw->rw_pid == 0);
+
+ /*
+ * Allocate and assign the Backend element. Note we must do this before
+ * forking, so that we can handle out of memory properly.
+ *
+ * Treat failure as though the worker had crashed. That way, the
+ * postmaster will wait a bit before attempting to start it again; if it
+ * tried again right away, most likely it'd find itself repeating the
+ * out-of-memory or fork failure condition.
+ */
+ if (!assign_backendlist_entry(rw))
+ {
+ rw->rw_crashed_at = GetCurrentTimestamp();
+ return false;
+ }
+
+ ereport(DEBUG1,
+ (errmsg("starting background worker process \"%s\"",
+ rw->rw_worker.bgw_name)));
+
+ worker_pid = do_fork_bgworker(rw);
+ if (worker_pid == -1)
+ {
+ /* undo what assign_backendlist_entry did */
+ ReleasePostmasterChildSlot(rw->rw_child_slot);
+ rw->rw_child_slot = 0;
+ free(rw->rw_backend);
+ rw->rw_backend = NULL;
+ /* mark entry as crashed, so we'll try again later */
+ rw->rw_crashed_at = GetCurrentTimestamp();
+ }
+ else
+ {
+ /* add new worker to lists of backends */
+ dlist_push_head(&BackendList, &rw->rw_backend->elem);
#ifdef EXEC_BACKEND
- ShmemBackendArrayAdd(rw->rw_backend);
+ ShmemBackendArrayAdd(rw->rw_backend);
#endif
- return true;
+ return true;
}
return false;
@@ -5797,6 +5970,10 @@ assign_backendlist_entry(RegisteredBgWorker *rw)
bn->bkend_type = BACKEND_TYPE_BGWORKER;
bn->dead_end = false;
bn->bgworker_notify = false;
+#ifdef RETRYABLE_FORK
+ bn->port = NULL;
+ bn->retry_count = 0;
+#endif
rw->rw_backend = bn;
rw->rw_child_slot = bn->child_slot;
@@ -6306,6 +6483,15 @@ ShmemBackendArrayAdd(Backend *bn)
}
static void
+ShmemBackendArrayUpdate(Backend *bn, pid_t oldpid)
+{
+ int i = bn->child_slot - 1;
+
+ Assert(ShmemBackendArray[i].pid == oldpid);
+ ShmemBackendArray[i] = *bn;
+}
+
+static void
ShmemBackendArrayRemove(Backend *bn)
{
int i = bn->child_slot - 1;
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 0747341..1a6a46c 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -35,6 +35,7 @@ typedef enum
PMSIGNAL_RECOVERY_STARTED, /* recovery has started */
PMSIGNAL_BEGIN_HOT_STANDBY, /* begin Hot Standby */
PMSIGNAL_WAKEN_ARCHIVER, /* send a NOTIFY signal to xlog archiver */
+ PMSIGNAL_BACKEND_SHMAT, /* newest backend no longer seeks shmem attachment */
PMSIGNAL_ROTATE_LOGFILE, /* send SIGUSR1 to syslogger to rotate logfile */
PMSIGNAL_START_AUTOVAC_LAUNCHER, /* start an autovacuum launcher */
PMSIGNAL_START_AUTOVAC_WORKER, /* start an autovacuum worker */
Noah Misch <noah@leadboat.com> writes:
On Tue, May 01, 2018 at 11:31:50AM -0400, Tom Lane wrote:
Well, at this point the only thing that's entirely clear is that none
of the ideas I had work. I think we are going to be forced to pursue
Noah's idea of doing an end-to-end retry. Somebody else will need to
take point on that; I lack a Windows environment and have already done
a lot more blind patch-pushing than I like in this effort.
Having tried this, I find a choice between performance and complexity. Both
of my designs use proc_exit(4) to indicate failure to reattach. The simpler,
slower design has WIN32 internal_forkexec() block until the child reports (via
SetEvent()) that it reattached to shared memory. This caused a fivefold
reduction in process creation performance[1].
Ouch.
The less-simple, faster design
stashes the Port structure and retry count in the BackendList entry, which
reaper() uses to retry the fork upon seeing status 4. Notably, this requires
new code for regular backends, for bgworkers, and for others.
Messy as that is, I think actually the worse problem with it is:
In this proof of concept, the
postmaster does not close its copy of a backend socket until the backend
exits.
That seems unworkable because it would interfere with detection of client
connection drops. But since you say this is just a POC, maybe you
intended to fix that? It'd probably be all right for the postmaster to
hold onto the socket until the new backend reports successful attach,
using the same signaling mechanism you had in mind for the other way.
Overall, I agree that neither of these approaches are exactly attractive.
We're paying a heck of a lot of performance or complexity to solve a
problem that shouldn't even be there, and that we don't understand well.
In particular, the theory that some privileged code is injecting a thread
into every new process doesn't square with my results at
/messages/by-id/15345.1525145612@sss.pgh.pa.us
I think our best course of action at this point is to do nothing until
we have a clearer understanding of what's actually happening on dory.
Perhaps such understanding will yield an idea for a less painful fix.
regards, tom lane
On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
In this proof of concept, the
postmaster does not close its copy of a backend socket until the backend
exits.That seems unworkable because it would interfere with detection of client
connection drops. But since you say this is just a POC, maybe you
intended to fix that? It'd probably be all right for the postmaster to
hold onto the socket until the new backend reports successful attach,
using the same signaling mechanism you had in mind for the other way.
It wasn't relevant to the concept being proven, so I suspended decisions in that
area. Arranging for socket closure is a simple matter of programming.
Overall, I agree that neither of these approaches are exactly attractive.
We're paying a heck of a lot of performance or complexity to solve a
problem that shouldn't even be there, and that we don't understand well.
In particular, the theory that some privileged code is injecting a thread
into every new process doesn't square with my results at
/messages/by-id/15345.1525145612@sss.pgh.pa.usI think our best course of action at this point is to do nothing until
we have a clearer understanding of what's actually happening on dory.
Perhaps such understanding will yield an idea for a less painful fix.
I see.
On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote:
On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote:
Overall, I agree that neither of these approaches are exactly attractive.
We're paying a heck of a lot of performance or complexity to solve a
problem that shouldn't even be there, and that we don't understand well.
In particular, the theory that some privileged code is injecting a thread
into every new process doesn't square with my results at
/messages/by-id/15345.1525145612@sss.pgh.pa.usI think our best course of action at this point is to do nothing until
we have a clearer understanding of what's actually happening on dory.
Perhaps such understanding will yield an idea for a less painful fix.I see.
Could one of you having a dory login use
https://live.sysinternals.com/Procmon.exe to capture process events during
backend startup? The ideal would be one capture where startup failed reattach
and another where it succeeded, but having the successful run alone would be a
good start. The procedure is roughly this:
- Install PostgreSQL w/ debug symbols.
- Start a postmaster.
- procmon /nomonitor
- procmon "Filter" menu -> Enable Advanced Output
- Ctrl-l, add filter for "Process Name" is "postgres.exe"
- Ctrl-e (starts collecting data)
- psql (leave it running)
- After ~60s, Ctrl-e again in procmon (stops collecting data)
- File -> Save -> PML
- File -> Save -> XML, include stack traces, resolve stack symbols
- Compress the PML and XML files, and mail them here
I'm attaching the data from a system not having the problem. On this system,
backend startup sees six thread creations:
1. main thread
2. thread created before postgres.exe has control
3. thread created before postgres.exe has control
4. thread created before postgres.exe has control
5. in pgwin32_signal_initialize()
6. in src\backend\port\win32\timer.c:setitimer()
Threads 2-4 exit exactly 30s after creation. If we fail to reattach to shared
memory, we'll exit before reaching code to start 5 or 6. It would be quite
interesting if dory makes a different number of threads or if threads 2-4 live
some duration other than 30s. It would also be interesting if dory has "Load
Image" events after postgres.exe code has started running. This unaffected
system loads mswsock.dll during read_inheritable_socket().
Thanks,
nm
Attachments:
unaffected-gustnado.XML.xzapplication/octet-streamDownload
�7zXZ ���F ! �X���w�$] w��������3�a`����f���'D�$��"8�S.��5��Xn����X�����@��5���\�X��!��y�yRU�U������'��w�6=�l�E��6���m��-Y�!fg7s��7�e����j ��)��1Yi|��g/�,�?���a 5"d����D�{c�I$f�E
X�6Pu�
7n�rmp��[b��v�`�j���V��w*�xR�L��';9������[�8��&�^s�����l}���� ��d�M����,K����%�U�UpQ�v
=��lP�C���%�����4S���.��>�Y�z���
)}��]�\TlQq@��P���&R����6��(M0�V��������w�F}�<��)�((���3])<�?��}:����M�"�J3fud�c_����,svL���[1���h���n�Huht�
'fjFh"�u�K�Y�����y���;�h�������C��l&F��v��)��~U(HSLwj�X#jvDz����L��.��V/[��Lm �H�L�G��� _ �1=�����8t���8����</?�Cx��������}U�����a ����t[�`}�sF�%���EDh+�<��Doa��?r0�^j�����(���4��4n��q�;`�_����z8#�4�RQ�><�J����>�:�1����R��$�����2�K�t�Q�V�e�� 9
$��_~��F���-�f'��q4��J�i�r 6�m�4+YW��g*J�68� �=P��s��D#a������������P�>)���Tl������j[���Gv�aW�X���-g`e}n����8�L
?��$���c��\{�PFm�~C��*�?�V��G��oc����8����g5w"����I
�q�QL���^T�w�paE�o+�/��W[��\fC6��]������$-������u/�%�'T��^,�9[7�"%���<��O���Y!B�T������h7CUR�������?+���u������a��3p�BV�K�=b��\8��Y$c��`�h�K�*��mj?J�;QY��s��I�
e�M��jY�;,((����a(N8�`��������N�bg������d)���E������r�;�Yg8F���~�C^�����a5����s����R�Ti�;�Tx�,3�X�=2Tk�c�V�,l$&����+�w7;��P�|5�`��z����i�����;�e�V�93��H��TSJ���ss�,�03��f?pY0���3������'�<�>����Q��x�^��9�<R�����RG s)�In��V��5](�
1��G�a ������;lE$��F�f�M�T��~]�5v�[��y(�v{y����j)���� �nM5ny��?�{��G''���0p�|�IQ����������,��
#RO1P�����MQ5����b����g�[!YiV/:�xf1N�B�@���%�M�6���~D$���dTs��/<�~;�U����z�s6H����Ty���{���f��g_}������K�v��4�#�!�gMo*�����Z��;��k���`�����C�?bvz?�i+��*2������
T�/��y�5Q�z��tk��E���8&yM+5M={X%�h�����+2��AV��,]�~��,�v����9�<W�[QC����I��� �s B^���P��^���i�`C�SZA�'���>w�
%p�n�9��� r��\������0��d{��n�b>�I�E�H�r�n����S���+�/w�,��F�S9I�uf��p���o3*�V����%<��{<}jMo�oc����������RR�oM}�o�������I`@��W��p��V�l���DQ�!t��@�c��Bd��Fr���f2m�����x�x��7�oOk�p�A:���a�V�ap� �����`{6V�rV��!��k��h��������*5>}r�����KF����g��o�F^��#�BABI���M����[@}{�-��F���'!����T(�6��d�H��k��J��k���k��&c��6Q)J�g���9�����������H�-���,���� t�h[�O��!S]���yI��v� �a��b7YA�
>O�@1%�7H�����$@�����1�)�'���}8��$RCK��TI�v������LF����R���fw��J�.�PZs�h<��� W�����p�O[v=��������@&����{j�j]�2������rDk��UID�
�r�6����Y6]v�f�b��X~�b����|���Q5K����o;|��*V��I���VO��]�����1��v������Z@�-� O^8��)�����6����BU�������s���a��]Pq:PSl; d� �8�>�2���L�l<D�P 6�w@D��������:�e�t7S:�����`US�W������2��)XK���@[YU* �� c�X�E��s�<Be������g,"�+��Yd���2`���t���s����T7"K�z���A���^�,������cV����>H�� }���w����V{l���*8X��6��m���Yz��;���)�T���&����8~���T���^���t
-���3�����b�l����)��j.����T�f;4}�L��6�V�`���l��1Q����� �B����?Zt����:k���[e�Z\RpSv9��B�}s�E�O��P�o~�cn�B�7NE]2WS%dF��U�)jy��\7�j�w:0)��KZ6g����53�x��g����o��y�]�14��.�'���6��
U���CJ�%��
~��������=]B�bh'�d^Y�~�:��%�������, !MK�tC�j�K�������[@��y�8F��$���{o����G!@�Q��]l4�(n�k6����Y������������\5�L'�;�<�0�b0�V,�[�~mi Z�Y����v[��QiH�'S��x�?[ � Op�����U 8x�*
b=�i8:�z-l�C"'g��a�@t��Q�f���-vt��Q ����\��������I"���:�� ��h P���^bv�7k,��{����8?e�P[�Y[h��X��e{q�f����1�Hw_�����5ll�87�e*T`0�}�
��'d-q������S���?����s�<P�.����_.��
�;�[��=�9a�II8���/S+������]f^6�(QY���i���I ��O�������F%�c�m4�a�l�� zy���K� /����%�ty6!� R��+X%��W�������l1�N[A�����Hj�{/a;�v��]�-���M���dF�>�@kI�$ �o���O"%c��+���5-�(�3�5��"�~���,���|�,G��������,w��%X�9��~�\����8 n/Wp��=|���������a���$b���asv)
S���)��O2��L��b�(�n%���oae������Y���Y��*2\EVo�4��
vxc!{�%V�]���F��������_������W���4�HV�>W}�J'�v�\1�g��>���q2�c������������uK3i���N��-�KS��������]Wj.�����XqH�!�g�����3"y�����3�7p�K��w^M��]/���:����c��|�����9�s�4�$��U��,��k��������M�S�:4H��jd��:��W����tpF1O��'�>���F>���N-���������N�
��#(1�3��@,:��S����C�r0��`sRqp8b&E&�[a� �YO���N���6���0��x+�Z@��w���c�*�V�N�f��a��>�Pv�P(~)�n��T���N������<���Z��d�e��!%�Xs��1��#t]0�v��p��O}����Y��^�_C��Sv�f\ y�<Em���-(uF)l���3�� �\�'�Uw#\.�h�=6x��D5��~3+�G��XG�q�eWO*�TM����@��`|1��[������D���"&�X2�_��l1&�| ^~|������U��� (8�v��&��}�����A{r��j��d�+�\���~�8�f������A������8��(�}�AA���|�.A�Z�SW���i�+��ru1�����?a3�C����kx�v�)���d�HXn:�fY?��%a��������V0�z�g�4N?J��wE(�p�����~�2a�\�� �-��lP2"X�o��f�Y�*c?f��6��Q�C]�g���o�F{����S�2��|�"�1_�#�0�s��Xg��%6 �j� �M0�3$�_K��/*�����\�$�#�Q�������"H�2|c�t�E���,d�����[�����se�A��j�!L
nHW���g<�\������
tp�
KI���y�L`���g<��z�I+)>����-���rxW����?���L�E r=����;W�X������#�w���D1=y��������������{'?2�."d��JZ�To��[����"8����M&���ut���!?�d�����X���y����6_���T_�c��_@�V����o�L9P��lt�x� �KFiH�V,���tm�3�3�`<���nk�4E�Y�U� a����]�.:�) ��s�������� {Ig���%�^n^%1��X�\�S|�mBQ?��N��6�'��n���`��HU�7�����n�k� @5� "0>���D!��+��f�/&��J���_#
8_���|�3}��C�_�S�0���z����+�uG�H��M�d�_���W6��g���~;*4h�F�>t��}DX�^%*����b�����P��T�U������jhB�/�����O��������'l[m�v^1�U���g{5,�.����3P�wd���f�����x���1j�H�k�6���������R��Fi�?�����rB����$����N����-����5�N���6�B������#�2"4�
�A�����������Fv�-��v�������4�|d�P}+$\�5���������B��W��WQ����&���W����
�(���5r��Z�Nh��)*�w3n$�>�a�,��(����
�5��������0I"�I���@N���� 4'��
X������|��y��D]�_�;�M���x��g��RC�H-��>+
��(��.����
�����~^7{7C�0��>��loQ�K��u�|��+�����TN�WY]���c���}`C���������P�������`���![]������4t��)54��`8^��5^}���DWh��� +����2}��M�� ��K���1P+���t���qqR�VB �4�'N���Z�����b��@J6W��i�Jgl�NWB>!O��X
Il��3�j�1��w��C�B����F�V�^�����u�L���B���n�nv�F���P�Q�]l�u��AM�����=�v@R����M�Vf';��G�j�fY�r5o.�3G#�����e��B�Ox�US�^����V���
�z��r�8��G�]��@���Ok�/���������y',\��3V�!?�P��?�3
1����v ��8^9�vm����v���L�n&v������K������u���o�~��F���n�Z�KN|K����
����DJy�5���-n ��]�J��)�\` ������#9��m���Y�������Iv����f������������{�y�G1�t��)���x�������7k��������_�2�W���Uug�gV����M��?H38q'H��z��g����76��}r09�
�$�AS��gL��|����2��d�����[��v�x�7�hK������]��K)�@~}�N����@�^��Z� �?����lWi3���%tFO*-�k�������� ���OK�Ud?1�M��f8��ka~>y�q$O5n1G'�:@pSM]] ���?��=DL����5���a�>D�0�Q��-9���`��N��� �� ���J��n>
I4�v4(��-�!��Ex��\T\���F_�*\*�Jv�w>�LP�5�����3�I+�� )O!�ud;zHD�Q�������\ ��<���d#���g����8?� H���|l��P1"c�<r�\��V@�~���Z���~�F�.�d��@����u�
�K��i�o��������6�bq�_[ ��Hb�A;� %���k|� ����v��@���;�L7���TB��x%6�S�E��#]qP�D���5��'� ������L��B��N99��2�� -���FY�X�� ���9�;�H��pUiD�]���q+_��<N�76���:E2����iZM(b>�d4s���<����M����CtM(
���_.J ���W���3Qs\� J��y�V��\��s�xe��
<Jdq!�@6���j2u�"^]������OlV]o�#:N��M����@5��dU�u��1�C����gj��L��-��"��B"��u\0=�"��-��1�&t�P&�4&@��T��S��<,ZfPdvx�<k����rrh��NC�J ���=�eJ`>y�kqY-+�������63�gSdn�Vh�0�A?#y��o�Z������J
9��\f`)��^B�%l���u+Z�b�������5���yI��G��:n��f]�b����/����K��4�tW��8���$��<�!� �������6������� 6�O�xU�U���;! R������`p���j�����{rH���;NQ4��R����Y-�W�m�!�r4�;������5�q��z������?Z���w�(�7���Y��DJ�w�8�
������%���ME �������#�kD�2���:�x�F����%�� Kw��HC��w���nc��Ki��}w��������{�i
p�{_��V�����^�"����
�Pi�Jz����S:2F�H���Wy5����=/�wE�O�&�X�Q�0��<]
@�V�P�0��i��_�<����
���[K!H :L��i��/���mf]s
%L�o�4;LW�������\y�n���'Y8��0K�F�z��
���W o ?g�b���f�J�o�A������f����P���/�4�� j������������q�^$�8*�t����]S�����^j��(��UG{�p��[fUX��I���F���[��������� ��[���m�>�
�Z+�J��p�����%>��T��rk z�����*rE+�S�JN�HGg����/R�=5��������N��L�����a�o�����������Pu�
��uN������������"��_�����J��8�<C`s���]��!<m?����\����$�D�9��m�1�(K�!�A5���m�{l�w��a$ 4�#R.D�9����l�^
�|����`i.��{��������� �{9pl��B�v3��������������� ����!�����+e�U=D�M]�@��SC�������������Tg�Y�o��?r����|��!������H���s �GQ�j_,kHP�8��O��T��%����_�
*J�������� �<g�����hk�jp4�������F-�^4��S&3��pQu��O��-���E����x �zAx�!���J�<}qd�Vu^��"n�`����"'-�o*�l��y�����(��gL��/������V����IKh<��J��P�,�O�v�
A���q��8�uP&����B��.*%����civ��w��Ll��@C����/��u�H�k�����;�#�]��_>�ATr�)�R�����c�N��\�Q�O��.��m�#��(����i��s���J8���Y]��P_�9�������~f/��%�Gj�
��1��
�!�Y���<~n�s�s�~�:R��BO�t��D��y��6l��@ ���4�I#�b8"�@(
Yp�q���A�)Z?���Lq��H4�BL~} ����a�)iUkaE�7��� j%�x,�l�P���^v��?�s3[��,pr�Y^�|�/�c����X\�Z��S�64��|/�jvP_l�{S�6�Q����*�u�`��x�!��H�T��(�(��V�K���FC��^��+�0Ox8zA�y"����+��_���S��,�9p�^�qJ�C��b��d�k&�E}^�iW�F+8�
������/�/�:���(��f�����W=l�{�#
�8Ze�
d�k
C�� 2>���a �J0=y5�n���m����s=Y_[J/���h}���8��R0���T��l����]��g�:������k"W4��O�)� �R��'�n�h����6���B���M��3�1/��c��v�F���A������}��ne�������_�j {9��=g�>b�(�u����@�;��q�Sk�;��jW�9&�"����5����w'hA���V�"����7�*A ��m���6�������� fA�b�v��p��I��M)s�"�.���.G�|�C�����UO� %���E������� ���g7 /����=�����'�u�?�"��.�$�;�65cg=����84HJx�
��(M������ �E�"*h5�r�Q� �[f�
���p�����7�;�5��/�[#�}r���Y����T3���#�,>_�������uPavV���P3����m�O!7@�$i�m�t���I�u����)��D���)�{ #�d�$��\���4�*�f<�54�����D\�3�g���n�4Vv{���dmgIo�]�j���<e����f�>�$��1�7#Q���y�V��!b���{d��x�;��1�_�@�%}���<�_���Nb&�k7�{��H�%Nh�v�r�p=U�Zs����P�`�n��`����s�&�=[��N��f���9+O
S���eG��/)y������1��Bz��ZU�>Dq�����D����D?
$~+�z-����G+�5V�VXo�#�g��M:�k��94�.��:�"�1���*��\�4��Q����;�5�������nT���,���]uFS(L��2�:�M��%P$Y��`��
�YFs����^~i��0`����+Cm��{j���|tA��
�C��������0O����QO�5lr��:WK�J�����nvzJ�H�X���N%�f���kw/?��o��R>�������C��������
��FT�����i,=����6|g�ux|Of�F�<S�}�G
�t�+ �}��7��X�h�(��iuT�U �����@�B ���G_���q�w��v!���Y��r�01��t���Be�����!���G��GX���.��h��w���si����� ����=����r W�5�
��DR��|I�E��J��%�m�@W�kaXXD=B�+�����&�� ������Z#,�%[��WG�����g��r%�n}�<�<��\����3;���P+����c�X�H�����+N
�D}�LJ�1��6�g����>��>����$w1������'+2Go��z]�i����I����o� M6(�b�M�v�z���A���>�~�$�2JX�k@������v�ub��k�x�����7�v�*���nB�Bd
��"q.*h�b��*-<�5�Z�P�������Z���l�������@�ZD��m�mI���5��WB
�}��B���)�v��JXY��"��t4�c�����2� Kn�42<tEoB��mb8
������t?2t�c�� h����%������TK�!JJ�l����f`F�X�6��v����=�t���\�� ����s��.=>c>�[R���A���_iy�~W�g�d]���(��<��<G�f_K�cOpqx�����{n��k�E�1�u�����a�*QoJM��t�?/}Y���v��g}���au-��$���D�P����{����xl�=B�>���h��MF�S������|t�� ����
������R�l9���yN>����!ts�9�z�^F��D�<?�����p�3PYUd����������1r�m�u�L5�y!TSPU.d�)��9������F��~��}@�j��s���j�-n��1n�R�*��c)��i�{�$�k/j���m�s;���{A"�s���xr�����������1��� ���p��� ����>_��0E�B�� T�������2���n�?D� [<1�0e�|��7��MV�����Q�'dj�`�~:#Y_M���ia��i�:�4��>������HaV����y��T)��3[��+
�ou�#11a ��
��/��ekz��h��d&�:��}=��XTH��0A���CY^<��mXJp����������Wb�����]{�C��c����
��-E%�3���f�g>��<������b�������dU�[�]2Y�k�g@����� X�'���!-t�u+�������L�����ub��w�cDR��$o���L��,��u��g�c?��F��]T(1<(\��udc��g8c����-hh?IgD4J.��@���C�1:��Ic7^���4� ��H�K��R �~��x��]�lz���yMt����[9����X(��O.����@��C�����V��=���HBn��N��,`E��,�GD����������O�?���yeEN2���3��?(�o��P���H��kwN�9����~0�����b����1P`�T���9QKw<kP���x[L�����j���~
�
vJ�[�RbI7S8�����t�^]���i,�m��6 ������n���V[*e�g���.���b�%>�ba ����.�W)P�w!y��yS.����"�.����W���?Q{
-A�:7�{�3#��%�9�(z����{���{��4p�}=}~�CN�LN��NX��5���o�����
������g�rZ�V���G�)c5�g��
:[�R���x�gn1x]a;�B:���Z������������F��Y�����a���Z��[�`� ������Mh���lI����]������wi�:rN�bp��0��Qq�s(C�~�v+��`5����o(��}����9
�( ��V��`g<:��u�`��4�>����F8�-�4���z��H����&�����3�s�Y�N!�g2�����u�YM4Q���uj�Z{hal>i�E���p�K �!��!:�&���������^�����|g����{ �J 2
��<�"�S��K���6���������P9���UL������Ro�N�����z�����UM$[�KFi}���;��T�5(~��n'��*����f��0���'�?��Dr�\ ���`:G0����]���"�#M]��q{�. � �0�D�$����Z3Q����F ����g�kz�����1�;1a[������@�J��@���Bxw�,a�����
~�<�H��1)F���Sk��G������3�>�.MT���{�F���o'[�h;�R�S����^s|�2�5hy��;_���h/�����a[^�j�j�|/�p����yj��!9��jF*];�{��<���S�z����p��AO�*�]h����V��j~>�QWv���5'�Z�`D�K3��va��b���*��>�����-�xZ��k���R��>�c/�d�[���)
x�&I#@/��%T�������78�G��#]��.�FY����7�0r_�F�T;�\��������g��}������[b;.T���������mS40����Ys�3:��gP��iG��Yy������"�=a����[H��Gp�l�o���.u����!lbm����v�����[� #w,�5
�kg� ��_�����#R�tf�y(uY����A���0�6"�\���A�B��I`s!�{%����Y)8� ��~deT�� K�O@�
���W��G*"��@{�H�*q��y!���\���g�qQ.�)����N��y�
F���2�%M�'2�WYT���F�vY�F���*���9�,��_�<���P^/"���x�:X������������^~\�HQ�[3��vw���E�W��6�#����X
a��C^`I���j/@�����<�]�u���Nc1�q���$'���� �K����s���l)�m��Ds%��9�Ez�Ds��8����a-��`_�e;+ ����A�y������0=�}a,H�����I��]?�S�������i+��sC`F�#��v�����wc�$/h/)��t�*9 ��t�0�S�ER���[�������`������f�}/ ���7tY}�rz�G�����������-M��D���(2��I�����Zs��^�� 5����F����m+��������A�&doN"��U�������a=N�VN��i���7ZDtP�L�Y�)�[�x�%HrH��1tY_�w��������?��<���}{1~���y�u���]K��=��l7��nD�o<2�)B"2;���g(X�M_]�x���%�
�4�u��Y��r��
�v�����}���X�*����D�,��1�@���'�C�L�J����&�g�����^������7�8�_���$A����r4���3�!���7��;&':��b#��#�e�GRM�ZVC��%�d��� �g��2D,��_�,������?����k���:�U�N��jF�`�0���Q�}#T�m�����>��U�v�lh9���&