Something fishy happening on frogmouth

Started by Tom Laneover 12 years ago38 messageshackers

tgl@sss.pgh.pa.us

over 12 years ago

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

It shouldn't be failing like that, considering that we just finished
probing for acceptable max_connections and shared_buffers without hitting
any apparent limit. I suppose it's possible that the final shm segment
size is a bit larger than what was tested at the shared_buffer step,
but that doesn't seem very likely to be the explanation. What seems
considerably more probable is that the probe for a shared memory
implementation is screwing up the system state somehow. It may not be
unrelated that this machine was happy before commit d2aecae went in.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Tom Lane (#1)

Re: Something fishy happening on frogmouth

On 10/29/2013 03:12 PM, Tom Lane wrote:

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

It shouldn't be failing like that, considering that we just finished
probing for acceptable max_connections and shared_buffers without hitting
any apparent limit. I suppose it's possible that the final shm segment
size is a bit larger than what was tested at the shared_buffer step,
but that doesn't seem very likely to be the explanation. What seems
considerably more probable is that the probe for a shared memory
implementation is screwing up the system state somehow. It may not be
unrelated that this machine was happy before commit d2aecae went in.

I'll try a run with that reverted just to see if that's it.

This is a 32 bit compiler on a 32 bit (virtual) machine, so the change
to Size is definitely more than cosmetic here.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Andrew Dunstan (#2)

Re: Something fishy happening on frogmouth

On 10/29/2013 03:47 PM, Andrew Dunstan wrote:

On 10/29/2013 03:12 PM, Tom Lane wrote:

It may not be
unrelated that this machine was happy before commit d2aecae went in.

I'll try a run with that reverted just to see if that's it.

This is a 32 bit compiler on a 32 bit (virtual) machine, so the change
to Size is definitely more than cosmetic here.

And with this reverted it's perfectly happy.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 12 years ago

In reply to: Tom Lane (#1)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 12:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

In windows implementation of dynamic shared memory, Size calculation
for creating dynamic shared memory is assuming that requested size for
creation of dynamic shared memory segment is uint64, which is changed
by commit d2aecae, so we need to change that calculation as well.
Please find the attached patch to fix this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Amit Kapila (#4)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 30, 2013 at 12:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

In windows implementation of dynamic shared memory, Size calculation
for creating dynamic shared memory is assuming that requested size for
creation of dynamic shared memory segment is uint64, which is changed
by commit d2aecae, so we need to change that calculation as well.
Please find the attached patch to fix this problem.

I find it hard to believe this is the right fix. I know we have
similar code in win32_shmem.c, but surely if size is a 32-bit unsigned
quantity then size >> 0 is simply 0 anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Robert Haas (#5)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 30, 2013 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 30, 2013 at 12:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

In windows implementation of dynamic shared memory, Size calculation
for creating dynamic shared memory is assuming that requested size for
creation of dynamic shared memory segment is uint64, which is changed
by commit d2aecae, so we need to change that calculation as well.
Please find the attached patch to fix this problem.

I find it hard to believe this is the right fix. I know we have
similar code in win32_shmem.c, but surely if size is a 32-bit unsigned
quantity then size >> 0 is simply 0 anyway.

Err, rather, size >> 32 is simply 0 anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#1)

Re: Something fishy happening on frogmouth

On Tue, Oct 29, 2013 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The last two buildfarm runs on frogmouth have failed in initdb,
like this:

creating directory d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... windows
creating configuration files ... ok
creating template1 database in d:/mingw-bf/root/HEAD/pgsql.2492/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not open shared memory segment "Global/PostgreSQL.851401618": Not enough space
child process exited with exit code 1

It shouldn't be failing like that, considering that we just finished
probing for acceptable max_connections and shared_buffers without hitting
any apparent limit. I suppose it's possible that the final shm segment
size is a bit larger than what was tested at the shared_buffer step,
but that doesn't seem very likely to be the explanation. What seems
considerably more probable is that the probe for a shared memory
implementation is screwing up the system state somehow. It may not be
unrelated that this machine was happy before commit d2aecae went in.

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

One possibility that occurs to me is that if, for some reason, we're
using the same handle every time on Windows, and if Windows takes a
bit of time to reclaim the segment after the postmaster exits (which
is not hard to believe given some previous Windows behavior I've
seen), then running the postmaster lots of times in quick succession
(as initdb does) might fail. I dunno what that has to do with the
patch, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Robert Haas (#7)

Re: Something fishy happening on frogmouth

On 2013-10-30 08:45:03 -0400, Robert Haas wrote:

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

One possibility that occurs to me is that if, for some reason, we're
using the same handle every time on Windows, and if Windows takes a
bit of time to reclaim the segment after the postmaster exits (which
is not hard to believe given some previous Windows behavior I've
seen), then running the postmaster lots of times in quick succession
(as initdb does) might fail. I dunno what that has to do with the
patch, though.

Could it be that we haven't primed the random number generator with the
time or something like that yet?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Robert Haas (#5)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I find it hard to believe this is the right fix. I know we have
similar code in win32_shmem.c, but surely if size is a 32-bit unsigned
quantity then size >> 0 is simply 0 anyway.

Gosh, I stand corrected. According to
http://msdn.microsoft.com/en-us/library/336xbhcz.aspx --

"The result is undefined if the right operand of a shift expression is
negative or if the right operand is greater than or equal to the
number of bits in the (promoted) left operand. No shift operation is
performed if the right operand is zero (0)."

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Andres Freund (#8)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 8:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-10-30 08:45:03 -0400, Robert Haas wrote:

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

One possibility that occurs to me is that if, for some reason, we're
using the same handle every time on Windows, and if Windows takes a
bit of time to reclaim the segment after the postmaster exits (which
is not hard to believe given some previous Windows behavior I've
seen), then running the postmaster lots of times in quick succession
(as initdb does) might fail. I dunno what that has to do with the
patch, though.

Could it be that we haven't primed the random number generator with the
time or something like that yet?

Yeah, I think that's probably what it is. There's PostmasterRandom()
to initialize the random-number generator on first use, but that
doesn't help if some other module calls random(). I wonder if we
ought to just get rid of PostmasterRandom() and instead have the
postmaster run that initialization code very early in startup. That'd
make the timing of the random number generator being initialized a bit
more predictable, perhaps, but if the dynamic shared memory code is
going to grab a random number during startup it's basically going to
be nailed to that event anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#7)

Re: Something fishy happening on frogmouth

Robert Haas <robertmhaas@gmail.com> writes:

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

Isn't this complaining about the main shm segment, not a DSM extension?

Also, why is the error "not enough space", rather than something about
a collision? And if this is the explanation, why didn't the previous
runs probing for allowable shmem size fail?

BTW, regardless of the specific properties of random(), surely you ought
to have code in there that would cope with a name collision.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Tom Lane (#11)

Re: Something fishy happening on frogmouth

On 2013-10-30 09:26:42 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

Isn't this complaining about the main shm segment, not a DSM extension?

Don't think so, that has a ":" in the name. But I think this touches a
fair point, I think we need to make all the dsm error messages more
distinctive. The history since this has been committed makes it likely
that there will be more errors.

Also, why is the error "not enough space", rather than something about
a collision? And if this is the explanation, why didn't the previous
runs probing for allowable shmem size fail?

Yea, I don't think this explains the issue but something independent
that needs to be fixed.

BTW, regardless of the specific properties of random(), surely you ought
to have code in there that would cope with a name collision.

There actually is code that retries, but only for EEXISTS.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Andres Freund (#12)

Re: Something fishy happening on frogmouth

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-10-30 09:26:42 -0400, Tom Lane wrote:

Isn't this complaining about the main shm segment, not a DSM extension?

Don't think so, that has a ":" in the name.

If it *isn't* about the main memory segment, what the hell are we doing
creating random addon segments during bootstrap? None of the DSM code
should even get control at this point, IMO.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#11)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 9:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

If I'm reading this correctly, the last three runs on frogmouth have
all failed, and all of them have failed with a complaint about,
specifically, Global/PostgreSQL.851401618. Now, that really shouldn't
be happening, because the code to choose that number looks like this:

dsm_control_handle = random();

Isn't this complaining about the main shm segment, not a DSM extension?

No. That's why the identifier being assigned to has "dsm" in it.
I'll respond to this in more detail in a separate post.

Also, why is the error "not enough space", rather than something about
a collision? And if this is the explanation, why didn't the previous
runs probing for allowable shmem size fail?

Good questions. I think that my previous theory was wrong, and that
the patch from Amit which I pushed a while ago should fix the
breakage.

BTW, regardless of the specific properties of random(), surely you ought
to have code in there that would cope with a name collision.

I do have code in there to cope with a name collision. However, that
doesn't mean it's good for it to choose the same name for the segment
by default every time. If we were going to do it that way I ought to
have just made it serial (PostgreSQL.0, 1, 2, 3, ...) instead of using
random numbers to name them. The reason I didn't do that is to
minimize the chances of collisions actually happening - and especially
to minimize the chances of a large number of collisions happening.
Especially for System V shared memory, the namespace is rather
constrained, so bouncing around randomly through the namespace makes
it unlikely that we'll hit a whole bunch of identifiers in a row that
are all already in use by some other postmaster or, indeed, a process
unrelated to PostgreSQL. A linear scan starting at any fixed value
wouldn't have that desirable property.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#14)

Re: Something fishy happening on frogmouth

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Oct 30, 2013 at 9:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, why is the error "not enough space", rather than something about
a collision? And if this is the explanation, why didn't the previous
runs probing for allowable shmem size fail?

Good questions. I think that my previous theory was wrong, and that
the patch from Amit which I pushed a while ago should fix the
breakage.

Indeed, I see frogmouth just went green, so Amit nailed it.

I'm still wondering why we try to create a DSM segment in bootstrap.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#10)

Re: Something fishy happening on frogmouth

Robert Haas <robertmhaas@gmail.com> writes:

Yeah, I think that's probably what it is. There's PostmasterRandom()
to initialize the random-number generator on first use, but that
doesn't help if some other module calls random(). I wonder if we
ought to just get rid of PostmasterRandom() and instead have the
postmaster run that initialization code very early in startup.

You could do arbitrary rearrangement of the postmaster's code and not
succeed in affecting this behavior in the slightest, because the
postmaster isn't running during bootstrap. I continue to doubt that
there's a good reason to be creating DSM segment(s) here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#13)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-10-30 09:26:42 -0400, Tom Lane wrote:

Isn't this complaining about the main shm segment, not a DSM extension?

Don't think so, that has a ":" in the name.

If it *isn't* about the main memory segment, what the hell are we doing
creating random addon segments during bootstrap? None of the DSM code
should even get control at this point, IMO.

Here's a short summary of what I posted back in August: at system
startup time, the postmaster creates one dynamic shared segment,
called the control segment. That segment sticks around for the
lifetime of the server and records the identity of any *other* dynamic
shared memory segments that are subsequently created. If the server
dies a horrible death (e.g. kill -9), the next postmaster will find
the previous control segment (whose ID is written to a file in the
data directory) and remove any leftover shared memory segments from
the previous run; without this, such segments would live until the
next server reboot unless manually removed by the user (which isn't
even practical on all platforms; e.g. there doesn't seem to be any way
to list all exstant POSIX shared memory segments on MacOS X, so a user
wouldn't know which segments to remove).

For my previous posting on this topic, see the following link,
particularly the paragraph which begins "The actual implementation is
split up into two layers" and the following one.

/messages/by-id/CA+TgmoaDqDUgt=4Zs_QPOnBt=EstEaVNP+5t+m=FPNWshiPR3A@mail.gmail.com

Now, you might ask why not store this control information that we need
for cleanup purposes in the *main* shared memory segment rather than
in a dynamic shared memory segment. The basic problem is that I don't
know how to dig it out of there in any reasonable way. The dsm
control segment is small and has a very simple structure; when the
postmaster uses the previous postmaster's leftover control segment to
clean up orphaned shared memory segments, it will ignore that old
control segment unless it passes various sanity tests. But even if
passes those sanity tests but is corrupted somehow otherwise, nothing
that happens as a result will cause a fatal error, let alone a server
crash. You're of course welcome to critique that logic, but I tried
my best to make it bulletproof. See
dsm_cleanup_using_control_segment().

The structure of the main shared memory segment is way more
complicated. If we latched onto an old main shared memory segment,
we'd presumably need to traverse ShmemIndex to even find that portion
of the shared memory segment where the DSM control information was
slated to be stored. And there's no way that's going to be robust in
the face of a possibly-corrupted shared memory segment left over from
a previous run. And that's actually making the assumption that we
could even do it that way, which we really can't: as of 9.3, things
like ShmemIndex are stored in the MAP_SHARED anonymous mapping, and
the System V shared memory segment is small and fixed-size. We could
try to refactor the code so that we merge the control segment data
into the residual System V segment, but I think it'd be ugly and I'm
not sure what it really buys us.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#16)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Yeah, I think that's probably what it is. There's PostmasterRandom()
to initialize the random-number generator on first use, but that
doesn't help if some other module calls random(). I wonder if we
ought to just get rid of PostmasterRandom() and instead have the
postmaster run that initialization code very early in startup.

You could do arbitrary rearrangement of the postmaster's code and not
succeed in affecting this behavior in the slightest, because the
postmaster isn't running during bootstrap.

Well, if you're telling me that it's not possible to find a way to
arrange things so that the random number is initialized before first
use, I'm gonna respectfully disagree. If you're just critiquing my
particular suggestion about where to put that code - fair enough.
Maybe it really ought to live in our src/port implementation of
random() or pg_lrand48().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#17)

Re: Something fishy happening on frogmouth

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Oct 30, 2013 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If it *isn't* about the main memory segment, what the hell are we doing
creating random addon segments during bootstrap? None of the DSM code
should even get control at this point, IMO.

Here's a short summary of what I posted back in August: at system
startup time, the postmaster creates one dynamic shared segment,
called the control segment.

Well, as I've pointed out already in this thread, the postmaster does not
execute during bootstrap, which makes me think this code is getting called
from the wrong place. What possible reason is there to create add-on shm
segments in bootstrap mode? I'm even dubious that we should create them
in standalone backends, because there will be no other process to share
them with.

I'm inclined to think this initialization should be moved to the actual
postmaster (and I mean postmaster.c) from wherever it is now. That might
fix the not-so-random name choice in itself, but if it doesn't, then we
could consider where to move the random-seed-initialization step to.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#19)

Re: Something fishy happening on frogmouth

On Wed, Oct 30, 2013 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Oct 30, 2013 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If it *isn't* about the main memory segment, what the hell are we doing
creating random addon segments during bootstrap? None of the DSM code
should even get control at this point, IMO.

Here's a short summary of what I posted back in August: at system
startup time, the postmaster creates one dynamic shared segment,
called the control segment.

Well, as I've pointed out already in this thread, the postmaster does not
execute during bootstrap, which makes me think this code is getting called
from the wrong place. What possible reason is there to create add-on shm
segments in bootstrap mode? I'm even dubious that we should create them
in standalone backends, because there will be no other process to share
them with.

I'm inclined to think this initialization should be moved to the actual
postmaster (and I mean postmaster.c) from wherever it is now. That might
fix the not-so-random name choice in itself, but if it doesn't, then we
could consider where to move the random-seed-initialization step to.

The initialization code is currently called form
CreateSharedMemoryAndSemaphores(), like this:

/* Initialize dynamic shared memory facilities. */
if (!IsUnderPostmaster)
dsm_postmaster_startup();

The reason I put it there is that if the postmaster does a
crash-and-restart cycle, we need create a new control segment just as
we need to create a new main shared memory segment. (We also need to
make sure all dynamic shared memory segments left over from the
previous postmaster lifetime get nuked, but that happens earlier, as
part of the shmem_exit sequence.)

There may be a good reason to move it elsewhere, but by and large I
have not had good luck deviating from the pattern laid down for the
main shared memory segment. My respect for that battle-tested code is
growing daily; every time I think I know better, I get burned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21