LISTEN/NOTIFY benchmarks?

Started by Nonameover 22 years ago16 messages
#1Noname
prashanth@jibenetworks.com

Hi,

I'm looking for information on the scalabality of the LISTEN/NOTIFY
mechanism. How well does it scale with respect to:

- hundreds of clients registered for LISTENs

I guess this translates to hundreds of the corresponding backend
processes receiving SIG_USR2 signals. The efficiency of this is
probably OS-dependent. Would anyone be in a position to give me
signal delivery benchmarks for FreeBSD on Unix?

- each client registered for thousands of LISTENs

From a look at backend/commands/async.c, it would seem that each
listening backend would get a signal for *every* LISTEN it
registered for, resulting in thousands of signals to the same
listening backend, instead of only one. Would it help if this was
optimized so that a signal was sent only once? Again, info on
relevant signal delivery benchmarks would be useful.

I'm not an expert on signals, not even a novice, so I might be totally
off base, but it seems like the Async Notification implementation does
not scale. If it does not, does anyone have a solution for the
problem of signalling a each event in a possibly very large set of
events to a large number of clients?

Thanks,

--prashanth

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: LISTEN/NOTIFY benchmarks?

prashanth@jibenetworks.com writes:

I'm not an expert on signals, not even a novice, so I might be totally
off base, but it seems like the Async Notification implementation does
not scale.

Very possibly. You didn't even mention the problems that would occur if
the pg_listener table didn't get vacuumed often enough.

The pghackers archives contain some discussion about reimplementing
listen/notify using a non-table-based infrastructure. But AFAIK no one
has picked up that task yet.

regards, tom lane

#3Hannu Krosing
hannu@tm.ee
In reply to: Noname (#1)
Re: LISTEN/NOTIFY benchmarks?

prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14:

Hi,

I'm looking for information on the scalabality of the LISTEN/NOTIFY
mechanism. How well does it scale with respect to:

- hundreds of clients registered for LISTENs

I guess this translates to hundreds of the corresponding backend
processes receiving SIG_USR2 signals. The efficiency of this is
probably OS-dependent. Would anyone be in a position to give me
signal delivery benchmarks for FreeBSD on Unix?

- each client registered for thousands of LISTENs

From a look at backend/commands/async.c, it would seem that each
listening backend would get a signal for *every* LISTEN it
registered for, resulting in thousands of signals to the same
listening backend, instead of only one.

But as the signals are usually generated async, you have no way to know
if a particular backend has already received a signal.

Or do you mean some mechanism that remembers "signals sent" in some
shared structure that the receiving backend can then clear when it
actually receives the signal ?

That could mean lock contention on that shared structure, unless we
decide that it is cheaper to just consult it without locking it and
accept an occasional delivery of unneeded signals.

Would it help if this was
optimized so that a signal was sent only once? Again, info on
relevant signal delivery benchmarks would be useful.

I still suspect that replacing pg_listener table from the mechanism
would give gains faster. Of course we could rework the signal mechanism
as well while doing it.

I'm not an expert on signals, not even a novice, so I might be totally
off base, but it seems like the Async Notification implementation does
not scale. If it does not, does anyone have a solution for the
problem of signalling a each event in a possibly very large set of
events to a large number of clients?

-----------------
Hannu

#4Noname
prashanth@jibenetworks.com
In reply to: Tom Lane (#2)
Re: LISTEN/NOTIFY benchmarks?

On Mon, Apr 28, 2003 at 10:19:16PM -0400, Tom Lane wrote:

prashanth@jibenetworks.com writes:

I'm not an expert on signals, not even a novice, so I might be totally
off base, but it seems like the Async Notification implementation does
not scale.

Very possibly. You didn't even mention the problems that would occur if
the pg_listener table didn't get vacuumed often enough.

The pghackers archives contain some discussion about reimplementing
listen/notify using a non-table-based infrastructure. But AFAIK no one
has picked up that task yet.

I found some messages in 03/2002 that also brought up the performance
issue. You had suggested the use of shared-memory, and made reference
to a "SI model". I did find see any alternative non-table-based
suggestions. What is the "SI model"?

Thanks,

--prashanth

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#4)
Re: LISTEN/NOTIFY benchmarks?

prashanth@jibenetworks.com writes:

I found some messages in 03/2002 that also brought up the performance
issue. You had suggested the use of shared-memory, and made reference
to a "SI model". I did find see any alternative non-table-based
suggestions. What is the "SI model"?

I meant following the example of the existing shared-cache-invalidation
signaling mechanism --- see
src/backend/storage/ipc/sinvaladt.c
src/backend/storage/ipc/sinval.c
src/include/storage/sinvaladt.h
src/include/storage/sinval.h

regards, tom lane

#6Noname
prashanth@jibenetworks.com
In reply to: Hannu Krosing (#3)
Re: LISTEN/NOTIFY benchmarks?

On Tue, Apr 29, 2003 at 10:10:47AM +0300, Hannu Krosing wrote:

prashanth@jibenetworks.com kirjutas T, 29.04.2003 kell 04:14:

- each client registered for thousands of LISTENs

From a look at backend/commands/async.c, it would seem that each
listening backend would get a signal for *every* LISTEN it
registered for, resulting in thousands of signals to the same
listening backend, instead of only one.

But as the signals are usually generated async, you have no way to know
if a particular backend has already received a signal.

Or do you mean some mechanism that remembers "signals sent" in some
shared structure that the receiving backend can then clear when it
actually receives the signal ?

No, I meant that a listening backend process would be sent multiple
signals from a notifying process, *in the inner loop* of
backend/commands/async.c:AtCommit_Notify().

If the listening backend had registered tens of thousands of LISTENs,
it would be sent an equivalent number of signals during a single run
of AtCommit_Notify(). I'm not sure what the cost of this is, since
I'm not sure how signal delivery works, but the tens of thousands of
system calls cannot be very cheap.

--prashanth

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#6)
Re: LISTEN/NOTIFY benchmarks?

prashanth@jibenetworks.com writes:

If the listening backend had registered tens of thousands of LISTENs,
it would be sent an equivalent number of signals during a single run
of AtCommit_Notify().

Not unless the notifier had notified all tens of thousands of condition
names in a single transaction.

regards, tom lane

#8Noname
prashanth@jibenetworks.com
In reply to: Tom Lane (#7)
Re: LISTEN/NOTIFY benchmarks?

On Tue, Apr 29, 2003 at 06:21:15PM -0400, Tom Lane wrote:

prashanth@jibenetworks.com writes:

If the listening backend had registered tens of thousands of LISTENs,
it would be sent an equivalent number of signals during a single run
of AtCommit_Notify().

Not unless the notifier had notified all tens of thousands of condition
names in a single transaction.

Unfortunately, that is a possibility in our application. We are now
working around this non-scalability.

Regardless, it would seem redundant to send more than one SIG_USR2 to the
recipient backend in that loop.

-- prashanth

#9Sean Chittenden
sean@chittenden.org
In reply to: Noname (#1)
Re: LISTEN/NOTIFY benchmarks?

I'm not an expert on signals, not even a novice, so I might be
totally off base, but it seems like the Async Notification
implementation does not scale. If it does not, does anyone have a
solution for the problem of signalling a each event in a possibly
very large set of events to a large number of clients?

<brainfart_for_the_archives> Hrm.... I should see about porting
kqueue/kevent as a messaging buss for the listen/notify bits to
postgresql... that does scale and it scales well to tens of thousands
of connections a second (easily over 60K, likely closer to 1M is the
limit).... </brainfart_for_the_archives>

--
Sean Chittenden

#10Gavin Sherry
swm@linuxworld.com.au
In reply to: Sean Chittenden (#9)
Re: LISTEN/NOTIFY benchmarks?

On Tue, 29 Apr 2003, Sean Chittenden wrote:

I'm not an expert on signals, not even a novice, so I might be
totally off base, but it seems like the Async Notification
implementation does not scale. If it does not, does anyone have a
solution for the problem of signalling a each event in a possibly
very large set of events to a large number of clients?

<brainfart_for_the_archives> Hrm.... I should see about porting
kqueue/kevent as a messaging buss for the listen/notify bits to
postgresql... that does scale and it scales well to tens of thousands
of connections a second (easily over 60K, likely closer to 1M is the
limit).... </brainfart_for_the_archives>

Except that it is FreeBSD specific -- being system calls and all -- if I
remember correctly. If you're going to move to a system like that, which
is a good idea, best move to a portable system.

Thanks,

Gavin

#11Sean Chittenden
sean@chittenden.org
In reply to: Gavin Sherry (#10)
Re: LISTEN/NOTIFY benchmarks?

I'm not an expert on signals, not even a novice, so I might be
totally off base, but it seems like the Async Notification
implementation does not scale. If it does not, does anyone have
a solution for the problem of signalling a each event in a
possibly very large set of events to a large number of clients?

<brainfart_for_the_archives> Hrm.... I should see about porting
kqueue/kevent as a messaging buss for the listen/notify bits to
postgresql... that does scale and it scales well to tens of
thousands of connections a second (easily over 60K, likely closer
to 1M is the limit).... </brainfart_for_the_archives>

Except that it is FreeBSD specific -- being system calls and all --
if I remember correctly. If you're going to move to a system like
that, which is a good idea, best move to a portable system.

You can #ifdef abstract things so that select() and poll() work if
available. Though now that I think about it, a queue that existed
completely in userland would be better... an shm implementation that's
abstracted would be ideal, but shm is a precious resource and can't
scale all that big. A shared mmap() region, however, is much less
scarce and can scale much higher. mmap() + semaphore as a gate to a
queue would be ideal, IMHO.

I shouldn't be posti^H^H^H^H^Hrambling though, haven't slept in 72hrs.
:-/ *stops reading email* -sc

--
Sean Chittenden

#12Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Sean Chittenden (#11)
Re: LISTEN/NOTIFY benchmarks?

Sorry for the late response to this, but I've been caught up in
merging TCQ to the 7.3.2 code base.

BTW, an announcement for those interested. We'll be doing a
demonstration of TelegraphCQ during the ACM SIGMOD Conference in
June. This year's SIGMOD is held in San Diego as part of the ACM FCRC
(Federated Computer Research Conf) - visit http://www.sigmod.org for
more details. SIGMOD runs from June 8-12 2003.

All pgsql hackers (and others) are cordially invited :-)

Do drop us an email if you're planning to show up.

"Sean" == Sean Chittenden <sean@chittenden.org> writes:

Sean> You can #ifdef abstract things so that select() and poll()
Sean> work if available. Though now that I think about it, a
Sean> queue that existed completely in userland would be
Sean> better... an shm implementation that's abstracted would be
Sean> ideal, but shm is a precious resource and can't scale all
Sean> that big. A shared mmap() region, however, is much less
Sean> scarce and can scale much higher. mmap() + semaphore as a
Sean> gate to a queue would be ideal, IMHO.

As part of our TelegraphCQ work, we've implemented a generic userland
queue. We support blocking/non-blocking operation at both
enqueue/dequeue time as well as different forms of latching.

The queue can also live in shared memory, for which we use a new
Shared Memory MemoryContext. This is implemented using libmm - a
memory management library that's came out of the Apache project.

Our current released version is based on the 7.2.1 source
base. However, our internal CVS tip is based on 7.3.2 - we had to make
a few changes to the shm allocator - one more function that's part of
a MemoryContext.

(We can afford to be slightly more profligate in our use of shared
memory as we process all concurrently executing streaming queries in a
single monster query plan. New queries are dynamically folded into a
running query plan on the fly. Since streams represent append-only
data we play fast and loose with transaction isolation ...)

The current version of the code is available at:

http://telegraph.cs.berkeley.edu/telegraphcq

If there is interest, we would love to contribute our queue
infrastructure to PostgreSQL. In fact, we'd love to contribute any of
our stuff that the pgsql folks find interesting/useful.

Our motivations are two-fold:

(1) We'd like to give back to the pgsql community.

(2) It's in our interest if things like the Queue/ShMem stuff is
part of pgsql as it means one less of a merge hassle in future.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#13Sean Chittenden
sean@chittenden.org
In reply to: Sailesh Krishnamurthy (#12)
Re: LISTEN/NOTIFY benchmarks?

(2) It's in our interest if things like the Queue/ShMem stuff is
part of pgsql as it means one less of a merge hassle in future.

I'd be quite interested in the work as it would remove my dependence
on jabberd as a distributed event/message bus and I could keep
everything inside of PostgreSQL, which is always a good thing. :) -sc

--
Sean Chittenden

#14Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Sean Chittenden (#13)
Re: LISTEN/NOTIFY benchmarks?

"Sean" == Sean Chittenden <sean@chittenden.org> writes:

(2) It's in our interest if things like the Queue/ShMem stuff
is part of pgsql as it means one less of a merge hassle in
future.

Sean> I'd be quite interested in the work as it would remove my
Sean> dependence on jabberd as a distributed event/message bus and
Sean> I could keep everything inside of PostgreSQL, which is
Sean> always a good thing. :) -sc

Sounds great ! Would it make more sense for us to correspond privately
and see if you can use our code and then submit a patch ?

Or is it better to have a discussion on HACKERS itself and lend itself
to further googling.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#15Sean Chittenden
sean@chittenden.org
In reply to: Sailesh Krishnamurthy (#14)
Re: LISTEN/NOTIFY benchmarks?

(2) It's in our interest if things like the Queue/ShMem stuff
is part of pgsql as it means one less of a merge hassle in
future.

Sean> I'd be quite interested in the work as it would remove my
Sean> dependence on jabberd as a distributed event/message bus and
Sean> I could keep everything inside of PostgreSQL, which is
Sean> always a good thing. :) -sc

Sounds great ! Would it make more sense for us to correspond privately
and see if you can use our code and then submit a patch ?

Or is it better to have a discussion on HACKERS itself and lend itself
to further googling.

Do you have a URL for the patch? If not, send it to me privately. I
can take any non-critical issues off line but I bet others have an
interest in this code as well.

I'm particularly interested in the API atm to see how hard it would be
to integrate. -sc

--
Sean Chittenden

#16Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Sean Chittenden (#15)
Re: LISTEN/NOTIFY benchmarks?

"Sean" == Sean Chittenden <sean@chittenden.org> writes:

Sean> Do you have a URL for the patch? If not, send it to me
Sean> privately. I can take any non-critical issues off line but
Sean> I bet others have an interest in this code as well.

TCQ website: http://telegraph.cs.berkeley.edu/telegraphcq

The code we have on the web is a source distribution based on 7.2 -
not as a patch.

I think I can produce a patch off of 7.3.2 - it's just a bunch of new
modules, although we had to add a few functions to the changed
semaphore abstractions.

Sean> I'm particularly interested in the API atm to see how hard
Sean> it would be to integrate. -sc

Since the API hasn't changed significantly internally maybe the best
bet is for you to download the src distribution on the link above and
look at the directories src/backend/rqueue as well src/include/rqueue

If things look promising, I can rustle up code that fits the 7.3.x
codebase.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh