dynamically allocating chunks from shared memory

Started by Markus Wanneralmost 16 years ago70 messageshackers

markus@bluegap.ch

almost 16 years ago

Hi,

for quite some time, I've been under the impression, that there's still
one disadvantage left from using processes instead of threads: we can
only use statically sized chunks of shared memory. Every component that
wants to use shared memory needs to pre-allocate whatever it thinks is
sufficient. It cannot enlarge its share, nor can unused memory be
allocated to other components.

Having written a very primitive kind of a dynamic memory allocator for
imessages [1]Postgres-R: internal messages http://archives.postgresql.org/message-id/4886DB0B.1090508@bluegap.ch, I've always wanted a better alternative. So I've
investigated a bit, refactored step-by-step, and finally came up with
the attached, lock based dynamic shared memory allocator. Its interface
is as simple as malloc() and free(). A restart of the postmaster should
truncate the whole area.

Being a component which needs to pre-allocate its area in shared memory
in advance, you need to define a maximum size for the pool of
dynamically allocatable memory. That's currently defined in shmem.h
instead of a GUC.

This kind of feature has been requested at the Tokyo Clusting Meeting
(by myself) in 2009 and is listed on the Wiki [2]Mentioned Cluster Feature http://wiki.postgresql.org/wiki/ClusterFeatures#Dynamic_shared_memory_allocation.

I'm now using that allocator as the basis for a reworked imessages
patch, which I've attached as well. Both are tested as a basis for
Postgres-R.

While I think other components could use this dynamic memory allocator,
too, I didn't write any code for that. Imessages currently is the only
user available. (So please apply the dynshmem patch first, then
imessages).

Comments?

Greetings from Oxford, and thanks to Joachim Wieland for providing me
the required Internet connectivity ;-)

Markus Wanner

[1]: Postgres-R: internal messages http://archives.postgresql.org/message-id/4886DB0B.1090508@bluegap.ch
http://archives.postgresql.org/message-id/4886DB0B.1090508@bluegap.ch

[2]: Mentioned Cluster Feature http://wiki.postgresql.org/wiki/ClusterFeatures#Dynamic_shared_memory_allocation
http://wiki.postgresql.org/wiki/ClusterFeatures#Dynamic_shared_memory_allocation

For git adicts: here's a git repository with both patches applied:
http://git.postgres-r.org/?p=imessages;a=summary

Alvaro Herrera

alvherre@2ndquadrant.com

almost 16 years ago

In reply to: Markus Wanner (#1)

Re: dynamically allocating chunks from shared memory

Excerpts from Markus Wanner's message of vie jul 02 19:44:46 -0400 2010:

Having written a very primitive kind of a dynamic memory allocator for
imessages [1], I've always wanted a better alternative. So I've
investigated a bit, refactored step-by-step, and finally came up with
the attached, lock based dynamic shared memory allocator. Its interface
is as simple as malloc() and free(). A restart of the postmaster should
truncate the whole area.

Interesting, thanks.

I gave it a skim and found that it badly needs a lot more code comments.

I'm also unconvinced that spinlocks are the best locking primitive here.
Why not lwlocks?

Being a component which needs to pre-allocate its area in shared memory
in advance, you need to define a maximum size for the pool of
dynamically allocatable memory. That's currently defined in shmem.h
instead of a GUC.

This should be an easy change; I agree that it needs to be configurable.

I'm not sure what kind of resistance you'll see to the idea of a
dynamically allocatable shmem area. Maybe we could use this in other
areas such as allocating space for heavyweight lock objects. Right now
the memory usage for them could grow due to a transitory increase in
lock traffic, leading to out-of-memory conditions later in other
modules. We've seen reports of that problem, so it'd be nice to be able
to fix that with this infrastructure.

I didn't look at the imessages patch (except to notice that I didn't
very much like the handling of out-of-memory, but you already knew that).

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Alvaro Herrera (#2)

Re: dynamically allocating chunks from shared memory

On Tue, Jul 20, 2010 at 1:50 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I'm not sure what kind of resistance you'll see to the idea of a
dynamically allocatable shmem area. Maybe we could use this in other
areas such as allocating space for heavyweight lock objects. Right now
the memory usage for them could grow due to a transitory increase in
lock traffic, leading to out-of-memory conditions later in other
modules. We've seen reports of that problem, so it'd be nice to be able
to fix that with this infrastructure.

Well, you can't really fix that problem with this infrastructure,
because this infrastructure only allows shared memory to be
dynamically allocated from a pool set aside for such allocations in
advance. If a surge in demand can exhaust all the heavyweight lock
space in the system, it can also exhaust the shared pool from which
more heavyweight lock space can be allocated. The failure might
manifest itself in a totally different subsystem though, since the
allocation that failed wouldn't necessarily be a heavyweight lock
allocation, but some other allocation that failed as a result of space
used by the heavyweight locks.

It would be more interesting if you could expand (or contract) the
size of shared memory as a whole while the system is up and running.
Then, perhaps, max_locks_per_transaction and other, similar GUCs could
be made PGC_SIGHUP, which would give you a way out of such situations
that didn't involve taking down the entire cluster. I'm not too sure
how to do that, though.

With respect to imessages specifically, what is the motivation for
using shared memory rather than something like an SLRU? The new
LISTEN implementation uses an SLRU and handles variable-size messages,
so it seems like it might be well-suited to this task.

Incidentally, the link for the imessages patch on the CommitFest page
points to http://archives.postgresql.org/message-id/ab0cd52a64e788f4ecb4515d1e6e4691@localhost
- which is the dynamic shmem patch. So I'm not sure where to find the
latest imessages patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Alvaro Herrera (#2)

Re: dynamically allocating chunks from shared memory

Hello Alvaro,

thank you for looking through this code.

On 07/20/2010 07:50 PM, Alvaro Herrera wrote:

Interesting, thanks.

I gave it a skim and found that it badly needs a lot more code comments.

Hm.. yeah, the dynshmem stuff could probably need more comments. (The
bgworker stuff is probably a better example).

I'm also unconvinced that spinlocks are the best locking primitive here.
Why not lwlocks?

It's derived from a completely lock-free algorithm, as proposed by Maged
M. Michael in: Scalable Lock-Free Dynamic Memory Allocator. I dropped
all of the CAS primitives with their retry loop around and did further
simplifications. Spinlocks simply looked like the simplest thing to
fall-back to. But yeah, splitting into read and write accesses and using
lwlocks might be a win. Or it might not. I honestly don't know. And it's
probably not the best performing allocator ever. But it's certainly
better than nothing.

I did recently release the lock-free variant as well as a lock based
one, see http://www.bluegap.ch/projects/wamalloc/ for more information.

I'm not sure what kind of resistance you'll see to the idea of a
dynamically allocatable shmem area.

So far neither resistance nor applause. I'd love to hear more of an
echo. Even if it's resistance.

Maybe we could use this in other
areas

..which is why I've published this separately from Postgres-R.

such as allocating space for heavyweight lock objects. Right now
the memory usage for them could grow due to a transitory increase in
lock traffic, leading to out-of-memory conditions later in other
modules. We've seen reports of that problem, so it'd be nice to be able
to fix that with this infrastructure.

Maybe, yes. Sounds like a nice idea.

I didn't look at the imessages patch (except to notice that I didn't
very much like the handling of out-of-memory, but you already knew that).

As all of the allocation problem has now been ripped out, the imessages
patch got quite a bit smaller. imsg.c now consists of only around 370
lines of code.

The handling of out-of-(shared)-memory situation could certainly be
improved, yes. Note that I've already separated out a
IMessageCreateInternal() method, which simply returns NULL in that case.
Is that the API you'd prefer?

Getting back to the dynshmem stuff: I don't mind much *which* allocator
to use. I also looked at jemalloc, but haven't been able to integrate it
into Postgres. So I've extended my experiment with wamalloc and turned
it into something usable for Postgres.

Regards

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#3)

Re: dynamically allocating chunks from shared memory

Hi,

On 07/20/2010 08:23 PM, Robert Haas wrote:

Well, you can't really fix that problem with this infrastructure,

No, but it would allow you to better use the existing amount of shared
memory. Possibly avoiding the problem is certain scenarios.

The failure might
manifest itself in a totally different subsystem though, since the
allocation that failed wouldn't necessarily be a heavyweight lock
allocation, but some other allocation that failed as a result of space
used by the heavyweight locks.

Yeah, that's a valid concern. Maybe it could be addressed by keeping
track of usage of dynshmem per module, and somehow inform the user about
the usage pattern in case of OOM.

It would be more interesting

Sure, but then you'd definitely need a dynamic allocator, no?

With respect to imessages specifically, what is the motivation for
using shared memory rather than something like an SLRU? The new
LISTEN implementation uses an SLRU and handles variable-size messages,
so it seems like it might be well-suited to this task.

Well, imessages predates the new LISTEN implementation by some moons.
They are intended to replace (unix-ish) pipes between processes. I fail
to see the immediate link between (S)LRU and inter-process message
passing. It might be more useful for multiple LISTENers, but I bet it
has slightly different semantics than imessages.

But to be honest, I don't know too much about the new LISTEN
implementation. Do you think a loss-less
(single)-process-to-(single)-process message passing system could be
built on top of it?

Incidentally, the link for the imessages patch on the CommitFest page
points to http://archives.postgresql.org/message-id/ab0cd52a64e788f4ecb4515d1e6e4691@localhost
- which is the dynamic shmem patch. So I'm not sure where to find the
latest imessages patch.

The archive doesn't display attachments very well. But the imessages
patch is part of that mail. Maybe you still find it in your local mailbox?

In the archive view, it starts at the line that says:
*** src/backend/storage/ipc/imsg.c dc149eef487eafb43409a78b8a33c70e7d3c2bfa

(and, well, the dynshmem stuff ends just before that line. Those were
two .diff files attached, IIRC).

Regards

Markus Wanner

Alvaro Herrera

alvherre@2ndquadrant.com

almost 16 years ago

In reply to: Markus Wanner (#4)

Re: dynamically allocating chunks from shared memory

Excerpts from Markus Wanner's message of mar jul 20 14:36:55 -0400 2010:

I'm also unconvinced that spinlocks are the best locking primitive here.
Why not lwlocks?

It's derived from a completely lock-free algorithm, as proposed by Maged
M. Michael in: Scalable Lock-Free Dynamic Memory Allocator.

Hmm, deriving code from a paper published by IBM sounds like bad news --
who knows what patents they hold on the techniques there?

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Alvaro Herrera (#6)

Re: dynamically allocating chunks from shared memory

Hi,

On 07/20/2010 09:05 PM, Alvaro Herrera wrote:

Hmm, deriving code from a paper published by IBM sounds like bad news --
who knows what patents they hold on the techniques there?

Yeah, that might be an issue. Note, however, that the lock-based variant
differs substantially from what's been published. And I sort of doubt
their patents covers a lot of stuff that's not lock-free-ish.

But again, I'd also very much welcome any other allocator. In my
opinion, it's the most annoying drawback of the process-based design
compared to a threaded variant (from the perspective of a developer).

Regards

Markus Wanner

Alvaro Herrera

alvherre@2ndquadrant.com

almost 16 years ago

In reply to: Markus Wanner (#5)

Re: dynamically allocating chunks from shared memory

Excerpts from Markus Wanner's message of mar jul 20 14:54:42 -0400 2010:

With respect to imessages specifically, what is the motivation for
using shared memory rather than something like an SLRU? The new
LISTEN implementation uses an SLRU and handles variable-size messages,
so it seems like it might be well-suited to this task.

Well, imessages predates the new LISTEN implementation by some moons.
They are intended to replace (unix-ish) pipes between processes. I fail
to see the immediate link between (S)LRU and inter-process message
passing. It might be more useful for multiple LISTENers, but I bet it
has slightly different semantics than imessages.

I guess what Robert is saying is that you don't need shmem to pass
messages around. The new LISTEN implementation was just an example.
imessages aren't supposed to use it directly. Rather, the idea is to
store the messages in a new SLRU area. Thus you don't need to mess with
dynamically allocating shmem at all.

But to be honest, I don't know too much about the new LISTEN
implementation. Do you think a loss-less
(single)-process-to-(single)-process message passing system could be
built on top of it?

I don't think you should build on top of LISTEN but of slru.c. This is
probably more similar to multixact (see multixact.c) than to the new
LISTEN implementation.

I think it should be rather straightforward. There would be a unique
append-point; each process desiring to send a new message to another
backend would add a new message at that point. There would be one read
pointer per backend, and it would be advanced as messages are consumed.
Old segments could be trimmed as backends advance their read pointer,
similar to how sinval queue is handled.

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Alvaro Herrera (#8)

Re: dynamically allocating chunks from shared memory

On Tue, Jul 20, 2010 at 5:46 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Markus Wanner's message of mar jul 20 14:54:42 -0400 2010:

With respect to imessages specifically, what is the motivation for
using shared memory rather than something like an SLRU? The new
LISTEN implementation uses an SLRU and handles variable-size messages,
so it seems like it might be well-suited to this task.

Well, imessages predates the new LISTEN implementation by some moons.
They are intended to replace (unix-ish) pipes between processes. I fail
to see the immediate link between (S)LRU and inter-process message
passing. It might be more useful for multiple LISTENers, but I bet it
has slightly different semantics than imessages.

I guess what Robert is saying is that you don't need shmem to pass
messages around. The new LISTEN implementation was just an example.
imessages aren't supposed to use it directly. Rather, the idea is to
store the messages in a new SLRU area. Thus you don't need to mess with
dynamically allocating shmem at all.

Right. I might be full of bull, but that's what I'm saying. :-)

But to be honest, I don't know too much about the new LISTEN
implementation. Do you think a loss-less
(single)-process-to-(single)-process message passing system could be
built on top of it?

I don't think you should build on top of LISTEN but of slru.c. This is
probably more similar to multixact (see multixact.c) than to the new
LISTEN implementation.

I think it should be rather straightforward. There would be a unique
append-point; each process desiring to send a new message to another
backend would add a new message at that point. There would be one read
pointer per backend, and it would be advanced as messages are consumed.
Old segments could be trimmed as backends advance their read pointer,
similar to how sinval queue is handled.

If the messages are mostly unicast, it might be nice if to contrive a
method whereby backends didn't need to explicitly advance over
messages destined only for other backends. Like maybe allocate a
small, fixed amount of shared memory sufficient for two "pointers"
into the SLRU area per backend, and then use the SLRU to store each
message with a header indicating where the next message is to be
found. For each backend, you store one pointer to the first queued
message and one pointer to the last queued message. New messages can
be added by making the current last message point to a newly added
message and updating the last message pointer for that backend. You'd
need to think about the locking and reference counting carefully to
make sure you eventually freed up unused pages, but it seems like it
might be doable. Of course, if the messages are mostly multi/anycast,
or if the rate of messaging is low enough that the aforementioned
complexity is not worth bothering with, then, what you said.

One big advantage of attacking the problem with an SLRU is that
there's no fixed upper limit on the amount of data that can be
enqueued at any given time. You can spill to disk or whatever as
needed (although hopefully you won't normally do so, for performance
reasons).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#10

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#9)

Re: dynamically allocating chunks from shared memory

On 07/21/2010 01:52 AM, Robert Haas wrote:

On Tue, Jul 20, 2010 at 5:46 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I guess what Robert is saying is that you don't need shmem to pass
messages around. The new LISTEN implementation was just an example.
imessages aren't supposed to use it directly. Rather, the idea is to
store the messages in a new SLRU area. Thus you don't need to mess with
dynamically allocating shmem at all.

Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.

Note that I sort of /want/ to mess with shared memory. It's what I know
how to deal with. It's how threaded programs work as well. Ya know,
locks, conditional variables, mutexes, all those nice thing that allow
you to shoot your foot so terribly nicely... Oh, well...

I think it should be rather straightforward. There would be a unique
append-point;

Unique append-point? Sounds like what I had before. That'd be a step
backwards, compared to the per-backend queue and an allocator that
hopefully scales well with the amount of CPU cores.

each process desiring to send a new message to another
backend would add a new message at that point. There would be one read
pointer per backend, and it would be advanced as messages are consumed.
Old segments could be trimmed as backends advance their read pointer,
similar to how sinval queue is handled.

That leads to pretty nasty fragmentation. A dynamic allocator should do
much better in that regard. (Wamalloc certainly does).

If the messages are mostly unicast, it might be nice if to contrive a
method whereby backends didn't need to explicitly advance over
messages destined only for other backends. Like maybe allocate a
small, fixed amount of shared memory sufficient for two "pointers"
into the SLRU area per backend, and then use the SLRU to store each
message with a header indicating where the next message is to be
found.

That's pretty much how imessages currently work. A single list of
messages queued per backend.

For each backend, you store one pointer to the first queued
message and one pointer to the last queued message. New messages can
be added by making the current last message point to a newly added
message and updating the last message pointer for that backend. You'd
need to think about the locking and reference counting carefully to
make sure you eventually freed up unused pages, but it seems like it
might be doable.

I've just read through slru.c, but still don't have a clue how it could
replace a dynamic allocator.

At the moment, the creator of an imessage allocs memory, copies the
payload there and then activates the message by appending it to the
recipient's queue. Upon getting signaled, the recipient consumes the
message by removing it from the queue and is obliged to release the
memory the messages occupies after having processed it. Simple and
straight forward, IMO.

The queue addition and removal is clear. But how would I do the
alloc/free part with SLRU? Its blocks are fixed size (BLCKSZ) and the
API with ReadPage and WritePage is rather unlike a pair of alloc() and
free().

One big advantage of attacking the problem with an SLRU is that
there's no fixed upper limit on the amount of data that can be
enqueued at any given time. You can spill to disk or whatever as
needed (although hopefully you won't normally do so, for performance
reasons).

Yes, imessages shouldn't ever be spilled to disk. There naturally must
be an upper limit for them. (Be it total available memory, as for
threaded things or a given and size-constrained pool, as is the case for
dynshmem).

To me it rather sounds like SLRU is a candidate for using dynamically
allocated shared memory underneath, instead of allocating a fixed amount
of slots in advance. That would allow more efficient use of shared
memory. (Given SLRU's ability to spill to disk, it could even be used to
'balance' out anomalies to some extent).

Regards

Markus Wanner

#11

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#10)

Re: dynamically allocating chunks from shared memory

On Wed, Jul 21, 2010 at 4:33 AM, Markus Wanner <markus@bluegap.ch> wrote:

Okay, so I just need to grok the SLRU stuff. Thanks for clarifying.

Note that I sort of /want/ to mess with shared memory. It's what I know how
to deal with. It's how threaded programs work as well. Ya know, locks,
conditional variables, mutexes, all those nice thing that allow you to shoot
your foot so terribly nicely... Oh, well...

For what it's worth, I feel your pain. I think the SLRU method is
*probably* better, but I feel your pain anyway.

For each backend, you store one pointer to the first queued
message and one pointer to the last queued message. New messages can
be added by making the current last message point to a newly added
message and updating the last message pointer for that backend. You'd
need to think about the locking and reference counting carefully to
make sure you eventually freed up unused pages, but it seems like it
might be doable.

I've just read through slru.c, but still don't have a clue how it could
replace a dynamic allocator.

At the moment, the creator of an imessage allocs memory, copies the payload
there and then activates the message by appending it to the recipient's
queue. Upon getting signaled, the recipient consumes the message by removing
it from the queue and is obliged to release the memory the messages occupies
after having processed it. Simple and straight forward, IMO.

The queue addition and removal is clear. But how would I do the alloc/free
part with SLRU? Its blocks are fixed size (BLCKSZ) and the API with ReadPage
and WritePage is rather unlike a pair of alloc() and free().

Given what you're trying to do, it does sound like you're going to
need some kind of an algorithm for space management; but you'll be
managing space within the SLRU rather than within shared_buffers. For
example, you might end up putting a header on each SLRU page or
segment and using that to track the available freespace within that
segment for messages to be read and written. It'll probably be a bit
more complex than the one for listen (see asyncQueueAddEntries).

One big advantage of attacking the problem with an SLRU is that
there's no fixed upper limit on the amount of data that can be
enqueued at any given time. You can spill to disk or whatever as
needed (although hopefully you won't normally do so, for performance
reasons).

Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
upper limit for them. (Be it total available memory, as for threaded things
or a given and size-constrained pool, as is the case for dynshmem).

I guess experience has taught me to be wary of things that are wired
in memory. Under extreme memory pressure, something's got to give, or
the whole system will croak. Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes). Then we'd really rather not still have
memory carved out for it.

To me it rather sounds like SLRU is a candidate for using dynamically
allocated shared memory underneath, instead of allocating a fixed amount of
slots in advance. That would allow more efficient use of shared memory.
(Given SLRU's ability to spill to disk, it could even be used to 'balance'
out anomalies to some extent).

I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#12

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#11)

Re: dynamically allocating chunks from shared memory

Hi,

first of all, thanks for your feedback, I enjoy the discussion.

On 07/21/2010 07:25 PM, Robert Haas wrote:

Given what you're trying to do, it does sound like you're going to
need some kind of an algorithm for space management; but you'll be
managing space within the SLRU rather than within shared_buffers. For
example, you might end up putting a header on each SLRU page or
segment and using that to track the available freespace within that
segment for messages to be read and written. It'll probably be a bit
more complex than the one for listen (see asyncQueueAddEntries).

But what would that buy us? Also consider that pretty much all available
dynamic allocators use shared memory (either from the OS directly, or
via mmap()'d area).

Yes, imessages shouldn't ever be spilled to disk. There naturally must be an
upper limit for them. (Be it total available memory, as for threaded things
or a given and size-constrained pool, as is the case for dynshmem).

I guess experience has taught me to be wary of things that are wired
in memory. Under extreme memory pressure, something's got to give, or
the whole system will croak.

I absolutely agree to that last sentence. However, experience has taught
/me/ to be wary of things that needlessly swap to disk for hours before
reporting any kind of error (AKA swap hell). I prefer systems that
adjust to the OOM condition, instead of just ignoring it and falling
back to disk (which isn't doesn't provide infinite space, so that's just
pushing the limits).

The solution for imessages certainly isn't spilling to disk, which would
consume even more resources. Instead the process(es) for which there are
pending imessages should be allowed to consume them.

That's why upon OOM, IMessageCreate currently simply blocks the process
that wants to create an imessages. And yes, that's not quite perfect
(that process should still consume messages for itself), and it might
not play well with other potential users of dynamically allocated
memory. But it certainly works better than spilling to disk (and yes, I
tested that behavior within Postgres-R).

Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes). Then we'd really rather not still have
memory carved out for it.

Huh? That's exactly what dynamic allocation could give you: not having
memory carved out for stuff you currently don't need, but instead being
able to dynamically use memory where most needed. SLRU has memory (not
disk space) carved out for pretty much every sub-system separately, if
I'm reading that code correctly.

I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.

..well, just add the shared_buffer pool to the list of candidates that
could use dynamically allocated shared memory. It would need some
thinking about boundaries (i.e. when to spill to disk, for those modules
that /want/ to spill to disk) and dealing with OOM situations, but
that's about it.

Regards

Markus

#13

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#12)

Re: dynamically allocating chunks from shared memory

On Wed, Jul 21, 2010 at 2:53 PM, Markus Wanner <markus@bluegap.ch> wrote:

Consider also the contrary situation,
where the imessages stuff is not in use (even for a short period of
time, like a few minutes). Then we'd really rather not still have
memory carved out for it.

Huh? That's exactly what dynamic allocation could give you: not having
memory carved out for stuff you currently don't need, but instead being able
to dynamically use memory where most needed. SLRU has memory (not disk
space) carved out for pretty much every sub-system separately, if I'm
reading that code correctly.

Yeah, I think you are right. :-(

I think what would be even better is to merge the SLRU pools with the
shared_buffer pool, so that the two can duke it out for who is in most
need of the limited amount of memory available.

..well, just add the shared_buffer pool to the list of candidates that could
use dynamically allocated shared memory. It would need some thinking about
boundaries (i.e. when to spill to disk, for those modules that /want/ to
spill to disk) and dealing with OOM situations, but that's about it.

I'm not sure why merging the SLRU pools with shared_buffers would
benefit from dynamically allocated shared memory.

I might be at (or possibly beyond) the limit of my ability to comment
intelligently on this without looking more at what you want to use
these imessages for, but I'm still pretty skeptical about the idea of
storing them directly in shared memory. It's possible, though, that I
am all wet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#14

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#13)

Re: dynamically allocating chunks from shared memory

Hi,

On 07/22/2010 12:11 AM, Robert Haas wrote:

I'm not sure why merging the SLRU pools with shared_buffers would
benefit from dynamically allocated shared memory.

Well, I'm not sure how you'd merge SLRU pools with shared_buffers. IMO
that inherently leads to the problem of allocating memory dynamically.

With such an allocator, I'd say you just port one module after another
to use that, instead of pre-allocated, fixed portions of shared memory.

I might be at (or possibly beyond) the limit of my ability to comment
intelligently on this without looking more at what you want to use
these imessages for, but I'm still pretty skeptical about the idea of
storing them directly in shared memory. It's possible, though, that I
am all wet.

Imessages are meant to be a replacement for unix pipes. (To my
knowledge, those don't spill to disk either, but are blocking as soon as
Linux considers the pipe to be 'full'. Whenever that is. Or am I wrong
here?)

The reasons for replacing them were: they consume lots of file
descriptors, they can only be established between the parent and its
child process (at least for anonymous pipes that's the case) and last
but not least, I got told they still aren't fully portable. Another nice
thing about imessages compared to unix pipes is, that it's a zero-copy
approach.

Hope that makes my opinions and decisions clearer. Thank you for sharing
your concerns and for explaining SLRU to me.

Regards

Markus Wanner

#15

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#14)

Re: dynamically allocating chunks from shared memory

On Thu, Jul 22, 2010 at 3:01 AM, Markus Wanner <markus@bluegap.ch> wrote:

On 07/22/2010 12:11 AM, Robert Haas wrote:

I'm not sure why merging the SLRU pools with shared_buffers would
benefit from dynamically allocated shared memory.

Well, I'm not sure how you'd merge SLRU pools with shared_buffers. IMO that
inherently leads to the problem of allocating memory dynamically.

With such an allocator, I'd say you just port one module after another to
use that, instead of pre-allocated, fixed portions of shared memory.

Well, shared_buffers has to be allocated as one contiguous slab
because we index into it that way. So I don't really see how
dynamically allocating memory could help. What you'd need is a
different system for assigning buffer tags, so that a particular tag
could refer to a buffer with either kind of contents.

I might be at (or possibly beyond) the limit of my ability to comment
intelligently on this without looking more at what you want to use
these imessages for, but I'm still pretty skeptical about the idea of
storing them directly in shared memory. It's possible, though, that I
am all wet.

Imessages are meant to be a replacement for unix pipes. (To my knowledge,
those don't spill to disk either, but are blocking as soon as Linux
considers the pipe to be 'full'. Whenever that is. Or am I wrong here?)

I think you're right about that.

The reasons for replacing them were: they consume lots of file descriptors,
they can only be established between the parent and its child process (at
least for anonymous pipes that's the case) and last but not least, I got
told they still aren't fully portable. Another nice thing about imessages
compared to unix pipes is, that it's a zero-copy approach.

That's sort of approaching the question from the opposite end from
what I was concerned about - I was wondering why you need a unicast
message-passing system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#16

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#15)

Re: dynamically allocating chunks from shared memory

Hi,

On 07/22/2010 01:04 PM, Robert Haas wrote:

Well, shared_buffers has to be allocated as one contiguous slab
because we index into it that way. So I don't really see how
dynamically allocating memory could help. What you'd need is a
different system for assigning buffer tags, so that a particular tag
could refer to a buffer with either kind of contents.

Hm.. okay, then it might not be that easy. Thanks for pointing that out.

That's sort of approaching the question from the opposite end from
what I was concerned about - I was wondering why you need a unicast
message-passing system.

Well, the initial Postgres-R approach, being based on Postgres
6.4.something used unix pipes. I coded imessages as a replacement.

Postgres-R basically uses imessages to pass around change sets and other
information required to keep replicas in sync. The thinking in terms of
message passing seems to originate from the GCS, which in itself is a
message passing system (with some nice extras and varying delivery
guarantees).

In Postgres-R the coordinator process receives messages from the GCS,
does some minor controlling and book-keeping, but basically passes on
the data via imessages to a backrgound worker.

Of course, as mentioned in the bgworker patch, this could be done
differently. Using solely shared memory, or maybe SLRU to store change
sets. However, I certainly like the abstraction and guarantees such a
message passing system provides. It makes things easier to reason about,
IMO.

For another example, see the bgworker patches, steps 1 and 2, where I've
changed the current autovacuum infrastructure to use imessages (between
launcher and worker).

[ And I've heard saying that current multi-core CPU designs tend to like
message passing systems. Not sure how much that applies to imessages
and/or how it's used in bgworkers or Postgres-R, though. ]

That much about why using a unicast message-passing system.

Regards

Markus Wanner

#17

Greg Smith

gsmith@gregsmith.com

almost 16 years ago

In reply to: Markus Wanner (#7)

Re: dynamically allocating chunks from shared memory

Markus Wanner wrote:

On 07/20/2010 09:05 PM, Alvaro Herrera wrote:

Hmm, deriving code from a paper published by IBM sounds like bad news --
who knows what patents they hold on the techniques there?

Yeah, that might be an issue. Note, however, that the lock-based
variant differs substantially from what's been published. And I sort
of doubt their patents covers a lot of stuff that's not lock-free-ish.

There's a fairly good mapping of what techniques are patented and which
were only mentioned in research in the Sun dynamic memory patent at
http://www.freepatentsonline.com/7328316.html ; that mentions an earlier
paper by the author of the technique Markus is using, but this was from
before that one was written. It looks like Sun has a large portion of
the patent portfolio in this area, which is particularly troublesome now.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#18

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Greg Smith (#17)

Re: dynamically allocating chunks from shared memory

Greg,

On 07/22/2010 03:59 PM, Greg Smith wrote:

There's a fairly good mapping of what techniques are patented and which
were only mentioned in research in the Sun dynamic memory patent at
http://www.freepatentsonline.com/7328316.html ; that mentions an earlier
paper by the author of the technique Markus is using, but this was from
before that one was written. It looks like Sun has a large portion of
the patent portfolio in this area, which is particularly troublesome now.

Thanks for the pointer, very helpful.

Anybody ever checked jemalloc, or any other OSS allocator out there
against these patents?

Remembering similar patent-discussions, it might be better to not bother
too much and just go with something widely used, based on the assumption
that such a thing is going to enjoy broad support in case of an attack
from a patent troll.

What do you think? What'd be your favorite allocator?

Regards

Markus Wanner

#19

Alvaro Herrera

alvherre@2ndquadrant.com

almost 16 years ago

In reply to: Markus Wanner (#16)

Re: dynamically allocating chunks from shared memory

Excerpts from Markus Wanner's message of jue jul 22 08:49:29 -0400 2010:

Of course, as mentioned in the bgworker patch, this could be done
differently. Using solely shared memory, or maybe SLRU to store change
sets. However, I certainly like the abstraction and guarantees such a
message passing system provides. It makes things easier to reason about,
IMO.

FWIW I don't think you should be thinking in "replacing imessages with
SLRU". I rather think you should be thinking in how can you implement
the imessages API on top of SLRU. So as far as the coordinator and
background worker are concerned, there wouldn't be any difference --
they keep using the same API they are using today.

Also let me repeat my earlier comment about imessages being more similar
to multixact than to notify. The content of each multixact entry is
just an arbitrary amount of bytes. If imessages are numbered from a
monotonically increasing sequence, it should be possible to use a very
similar technique, and perhaps you should be able to reduce locking
requirements as well (write messages with only a shared lock, after
you've determined and reserved the area you're going to write).

#20

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Alvaro Herrera (#19)

Re: dynamically allocating chunks from shared memory

Hi,

On 07/22/2010 08:31 PM, Alvaro Herrera wrote:

FWIW I don't think you should be thinking in "replacing imessages with
SLRU". I rather think you should be thinking in how can you implement
the imessages API on top of SLRU.

Well, I'm rather comparing SLRU with the dynamic allocator. So far I'm
unconvinced that SLRU would be a better base for imessages than a
dynamic allocator. (And I'm arguing that SLRU should use a dynamic
allocator underneath).

So as far as the coordinator and
background worker are concerned, there wouldn't be any difference --
they keep using the same API they are using today.

Agreed, the imessages API to the upper layer doesn't need to care about
the underlying stuff.

Also let me repeat my earlier comment about imessages being more similar
to multixact than to notify. The content of each multixact entry is
just an arbitrary amount of bytes. If imessages are numbered from a
monotonically increasing sequence,

Well, there's absolutely no need to serialize imessages. So they don't
currently carry any such number. And opposed to multixact entries, they
are clearly directed at exactly one single consumer. Every consumer has
its own receive queue. Sending messages concurrently to different
recipients may happen completely parallelized, without any (b)locking in
between.

The dynamic allocator is the only part of the chain which might need to
do some locking to protect the shared resource (memory) against
concurrent access. Note, however, that wamalloc (as any modern dynamic
allocator) is parallelized to some extent, i.e. concurrent malloc/free
calls don't necessarily need to block each other.

it should be possible to use a very
similar technique, and perhaps you should be able to reduce locking
requirements as well (write messages with only a shared lock, after
you've determined and reserved the area you're going to write).

Writing to the message is currently (i.e. imessages-on-dynshmem) done
without *any* kind of lock held. So that would rather increase locking
requirements and lower parallelism, I fear.

Regards

Markus

#21

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#20)

#22

Alvaro Herrera

alvherre@2ndquadrant.com

almost 16 years ago

In reply to: Robert Haas (#21)

#23

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Alvaro Herrera (#22)

#24

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Alvaro Herrera (#22)

#25

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#23)

#26

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#25)

#27

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#26)

#28

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#27)

#29

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 16 years ago

In reply to: Robert Haas (#28)

#30

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Kevin Grittner (#29)

#31

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Markus Wanner (#27)

#32

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#31)

#33

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#31)

#34

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#32)

#35

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Robert Haas (#32)

#36

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Markus Wanner (#33)

#37

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Bruce Momjian (#36)

#38

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: Tom Lane (#34)

#39

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Tom Lane (#34)

#40

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#36)

#41

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#34)

#42

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#35)

#43

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#37)

#44

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#37)

#45

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Markus Wanner (#40)

#46

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Robert Haas (#43)

#47

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Markus Wanner (#44)

#48

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#46)

#49

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Markus Wanner (#44)

#50

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#46)

#51

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Markus Wanner (#48)

#52

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Robert Haas (#50)

#53

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#51)

#54

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#52)

#55

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Robert Haas (#54)

#56

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#52)

#57

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Bruce Momjian (#55)

#58

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Bruce Momjian (#55)

#59

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Robert Haas (#49)

#60

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Markus Wanner (#39)

#61

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#41)

#62

Markus Wanner

markus@bluegap.ch

almost 16 years ago

In reply to: Tom Lane (#60)

#63

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#61)

#64

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#63)

#65

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 16 years ago

In reply to: Robert Haas (#54)

#66

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#64)

#67

Martijn van Oosterhout

kleptog@svana.org

almost 16 years ago

In reply to: Robert Haas (#41)

#68

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#66)

#69

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Robert Haas (#66)

#70

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#68)

dynamically allocating chunks from shared memory

Attachments: