Optimize LISTEN/NOTIFY

Started by Joel Jacobson10 months ago126 messageshackers

joel@compiler.org

10 months ago

Hi hackers,

The current LISTEN/NOTIFY implementation is well-suited for use-cases like
cache invalidation where many backends listen on the same channel. However,
its scalability is limited when many backends listen on distinct
channels. The root of the problem is that Async_Notify must signal every
listening backend in the database, as it lacks central knowledge of which
backend is interested in which channel. This results in an O(N) number of
kill(pid, SIGUSR1) syscalls as the listener count grows.

The attached proof-of-concept patch proposes a straightforward
optimization for the single-listener case. It introduces a shared-memory
hash table mapping (dboid, channelname) to the ProcNumber of a single
listener. When NOTIFY is issued, we first check this table. If a single
listener is found, we signal only that backend. Otherwise, we fall back to
the existing broadcast behavior.

The performance impact for this pattern is significant. A benchmark [1]Benchmark tool and full results: https://github.com/joelonsql/pg-bench-listen-notify
measuring a NOTIFY "ping-pong" between two connections, while adding a
variable number of idle listeners, shows the following:

master (8893c3a):
0 extra listeners: 9126 TPS
10 extra listeners: 6233 TPS
100 extra listeners: 2020 TPS
1000 extra listeners: 238 TPS

0001-Optimize-LISTEN-NOTIFY-signaling-for-single-listener.patch:
0 extra listeners: 9152 TPS
10 extra listeners: 9352 TPS
100 extra listeners: 9320 TPS
1000 extra listeners: 8937 TPS

As you can see, the patched version's performance is near O(1) with respect
to the number of idle listeners, while the current implementation shows the
expected O(N) degradation.

This patch is a first-step. It uses a simple boolean has_multiple_listeners
flag in the hash entry. Once a channel gets a second listener, this flag is
set and, crucially, never cleared. The entry will then permanently indicate
"multiple listeners", even after all backends on that channel disconnect.

A more complete solution would likely use reference counting for each
channel's listeners. This would solve the "stuck entry" problem and could
also enable a further optimization: targeted signaling to all listeners of a
multi-user channel, avoiding the database-wide broadcast entirely.

The patch also includes a "wake only tail" optimization (contributed by
Marko Tikkaja) to help prevent backends from falling too far behind.
Instead of waking all lagging backends at once and creating a "thundering
herd", this logic signals only the single backend that is currently at the
queue tail. This ensures the global queue tail can always advance, relying
on a chain reaction to get backends caught up efficiently. This seems like
a sensible improvement in its own right.

Thoughts?

/Joel

[1]: Benchmark tool and full results: https://github.com/joelonsql/pg-bench-listen-notify

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Joel Jacobson (#1)

Re: Optimize LISTEN/NOTIFY

"Joel Jacobson" <joel@compiler.org> writes:

The attached proof-of-concept patch proposes a straightforward
optimization for the single-listener case. It introduces a shared-memory
hash table mapping (dboid, channelname) to the ProcNumber of a single
listener.

What does that do to the cost and parallelizability of LISTEN/UNLISTEN?

The patch also includes a "wake only tail" optimization (contributed by
Marko Tikkaja) to help prevent backends from falling too far behind.

Coulda sworn we dealt with that case some years ago. In any case,
if it's independent of the other idea it should probably get its
own thread.

regards, tom lane

Optimize LISTEN/NOTIFY

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: