proposal: LISTEN *
Hi,
I've in the past wanted to listen on all notification channels in the
current database for debugging purposes. But recently I came across
another use case. Since having multiple postgres backends listening for
notifications is very inefficient (one Thursday I found out 30% of our
CPU time was spent spinning on s_locks around the notification code), it
makes sense to implement a notification server of sorts which then
passes on notifications from postgres to interested clients. A server
like this would be a lot more simple to implement if there was a way to
announce that the client wants to see all notifications, regardless of
the name of the channel.
Attached is a very crude proof-of-concept patch implementing this. Any
thoughts on the idea?
.m
Attachments:
listen_star_poc.patchtext/plain; charset=UTF-8; name=listen_star_poc.patch; x-mac-creator=0; x-mac-type=0Download
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
***************
*** 294,299 **** static SlruCtlData AsyncCtlData;
--- 294,304 ----
static List *listenChannels = NIL; /* list of C strings */
/*
+ * If listenWildcard is set, we're listening on all channels.
+ */
+ static bool listenWildcard = false;
+
+ /*
* State for pending LISTEN/UNLISTEN actions consists of an ordered list of
* all actions requested in the current transaction. As explained above,
* we don't actually change listenChannels until we reach transaction commit.
***************
*** 306,311 **** static List *listenChannels = NIL; /* list of C strings */
--- 311,317 ----
typedef enum
{
LISTEN_LISTEN,
+ LISTEN_LISTEN_ALL,
LISTEN_UNLISTEN,
LISTEN_UNLISTEN_ALL
} ListenActionKind;
***************
*** 373,378 **** static void queue_listen(ListenActionKind action, const char *channel);
--- 379,385 ----
static void Async_UnlistenOnExit(int code, Datum arg);
static void Exec_ListenPreCommit(void);
static void Exec_ListenCommit(const char *channel);
+ static void Exec_ListenAllCommit(void);
static void Exec_UnlistenCommit(const char *channel);
static void Exec_UnlistenAllCommit(void);
static bool IsListeningOn(const char *channel);
***************
*** 598,604 **** Async_Notify(const char *channel, const char *payload)
/*
* queue_listen
! * Common code for listen, unlisten, unlisten all commands.
*
* Adds the request to the list of pending actions.
* Actual update of the listenChannels list happens during transaction
--- 605,611 ----
/*
* queue_listen
! * Common code for listen, listen all, unlisten, unlisten all commands.
*
* Adds the request to the list of pending actions.
* Actual update of the listenChannels list happens during transaction
***************
*** 613,620 **** queue_listen(ListenActionKind action, const char *channel)
/*
* Unlike Async_Notify, we don't try to collapse out duplicates. It would
* be too complicated to ensure we get the right interactions of
! * conflicting LISTEN/UNLISTEN/UNLISTEN_ALL, and it's unlikely that there
! * would be any performance benefit anyway in sane applications.
*/
oldcontext = MemoryContextSwitchTo(CurTransactionContext);
--- 620,627 ----
/*
* Unlike Async_Notify, we don't try to collapse out duplicates. It would
* be too complicated to ensure we get the right interactions of
! * conflicting LISTEN/LISTEN_ALL/UNLISTEN/UNLISTEN_ALL, and it's unlikely
! * that there would be any performance benefit anyway in sane applications.
*/
oldcontext = MemoryContextSwitchTo(CurTransactionContext);
***************
*** 644,649 **** Async_Listen(const char *channel)
--- 651,671 ----
}
/*
+ * Async_ListenAll
+ *
+ * This is executed by the LISTEN * command.
+ */
+ void
+ Async_ListenAll(void)
+ {
+ if (Trace_notify)
+ elog(DEBUG1, "Async_ListenAll(%d)", MyProcPid);
+
+ queue_listen(LISTEN_LISTEN_ALL, "");
+ }
+
+
+ /*
* Async_Unlisten
*
* This is executed by the SQL unlisten command.
***************
*** 790,795 **** PreCommit_Notify(void)
--- 812,818 ----
switch (actrec->action)
{
case LISTEN_LISTEN:
+ case LISTEN_LISTEN_ALL:
Exec_ListenPreCommit();
break;
case LISTEN_UNLISTEN:
***************
*** 895,900 **** AtCommit_Notify(void)
--- 918,926 ----
case LISTEN_LISTEN:
Exec_ListenCommit(actrec->channel);
break;
+ case LISTEN_LISTEN_ALL:
+ Exec_ListenAllCommit();
+ break;
case LISTEN_UNLISTEN:
Exec_UnlistenCommit(actrec->channel);
break;
***************
*** 905,911 **** AtCommit_Notify(void)
}
/* If no longer listening to anything, get out of listener array */
! if (amRegisteredListener && listenChannels == NIL)
asyncQueueUnregister();
/* And clean up */
--- 931,937 ----
}
/* If no longer listening to anything, get out of listener array */
! if (amRegisteredListener && listenChannels == NIL && !listenWildcard)
asyncQueueUnregister();
/* And clean up */
***************
*** 1026,1031 **** Exec_ListenCommit(const char *channel)
--- 1052,1076 ----
}
/*
+ * Exec_ListenAllCommit --- subroutine for AtCommit_Notify
+ *
+ * Start listening on all notification channels.
+ */
+ static void
+ Exec_ListenAllCommit(void)
+ {
+ listenWildcard = true;
+
+ /*
+ * We can forget all notification channels right away. The only way to
+ * undo a "LISTEN *" is to "UNLISTEN *", and in that case we'd just release
+ * the list of channels anyway.
+ */
+ list_free_deep(listenChannels);
+ listenChannels = NIL;
+ }
+
+ /*
* Exec_UnlistenCommit --- subroutine for AtCommit_Notify
*
* Remove the specified channel name from listenChannels.
***************
*** 1072,1077 **** Exec_UnlistenAllCommit(void)
--- 1117,1123 ----
list_free_deep(listenChannels);
listenChannels = NIL;
+ listenWildcard = false;
}
/*
***************
*** 1130,1136 **** ProcessCompletedNotifies(void)
/* Send signals to other backends */
signalled = SignalBackends();
! if (listenChannels != NIL)
{
/* Read the queue ourselves, and send relevant stuff to the frontend */
asyncQueueReadAllNotifications();
--- 1176,1182 ----
/* Send signals to other backends */
signalled = SignalBackends();
! if (listenChannels != NIL || listenWildcard)
{
/* Read the queue ourselves, and send relevant stuff to the frontend */
asyncQueueReadAllNotifications();
***************
*** 1956,1962 **** asyncQueueProcessPageEntries(volatile QueuePosition *current,
/* qe->data is the null-terminated channel name */
char *channel = qe->data;
! if (IsListeningOn(channel))
{
/* payload follows channel name */
char *payload = qe->data + strlen(channel) + 1;
--- 2002,2008 ----
/* qe->data is the null-terminated channel name */
char *channel = qe->data;
! if (listenWildcard || IsListeningOn(channel))
{
/* payload follows channel name */
char *payload = qe->data + strlen(channel) + 1;
***************
*** 2044,2050 **** ProcessIncomingNotify(void)
notifyInterruptPending = false;
/* Do nothing else if we aren't actively listening */
! if (listenChannels == NIL)
return;
if (Trace_notify)
--- 2090,2096 ----
notifyInterruptPending = false;
/* Do nothing else if we aren't actively listening */
! if (listenChannels == NIL && !listenWildcard)
return;
if (Trace_notify)
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***************
*** 8543,8548 **** ListenStmt: LISTEN ColId
--- 8543,8554 ----
n->conditionname = $2;
$$ = (Node *)n;
}
+ | LISTEN '*'
+ {
+ ListenStmt *n = makeNode(ListenStmt);
+ n->conditionname = NULL;
+ $$ = (Node *)n;
+ }
;
UnlistenStmt:
*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
***************
*** 609,615 **** standard_ProcessUtility(Node *parsetree,
PreventCommandDuringRecovery("LISTEN");
CheckRestrictedOperation("LISTEN");
! Async_Listen(stmt->conditionname);
}
break;
--- 609,618 ----
PreventCommandDuringRecovery("LISTEN");
CheckRestrictedOperation("LISTEN");
! if (stmt->conditionname)
! Async_Listen(stmt->conditionname);
! else
! Async_ListenAll();
}
break;
*** a/src/include/commands/async.h
--- b/src/include/commands/async.h
***************
*** 31,36 **** extern void AsyncShmemInit(void);
--- 31,37 ----
/* notify-related SQL statements */
extern void Async_Notify(const char *channel, const char *payload);
extern void Async_Listen(const char *channel);
+ extern void Async_ListenAll(void);
extern void Async_Unlisten(const char *channel);
extern void Async_UnlistenAll(void);
Hi
2015-11-19 4:40 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
Hi,
I've in the past wanted to listen on all notification channels in the
current database for debugging purposes. But recently I came across
another use case. Since having multiple postgres backends listening for
notifications is very inefficient (one Thursday I found out 30% of our CPU
time was spent spinning on s_locks around the notification code), it makes
sense to implement a notification server of sorts which then passes on
notifications from postgres to interested clients. A server like this
would be a lot more simple to implement if there was a way to announce that
the client wants to see all notifications, regardless of the name of the
channel.Attached is a very crude proof-of-concept patch implementing this. Any
thoughts on the idea?
It is looking much more like workaround. Proposed feature isn't bad, but
optimization of current implementation should be better.
Isn't possible to fix internal implementation?
Pavel
Show quoted text
.m
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/19/15 4:48 AM, Pavel Stehule wrote:
2015-11-19 4:40 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
I've in the past wanted to listen on all notification channels in the
current database for debugging purposes. But recently I came across
another use case. Since having multiple postgres backends listening for
notifications is very inefficient (one Thursday I found out 30% of our CPU
time was spent spinning on s_locks around the notification code), it makes
sense to implement a notification server of sorts which then passes on
notifications from postgres to interested clients. A server like this
would be a lot more simple to implement if there was a way to announce that
the client wants to see all notifications, regardless of the name of the
channel.Attached is a very crude proof-of-concept patch implementing this. Any
thoughts on the idea?It is looking much more like workaround. Proposed feature isn't bad, but
optimization of current implementation should be better.Isn't possible to fix internal implementation?
It's probably possible to improve the internal implementation. But the
way I see it, it's always going to be less efficient than a dedicated
process outside the system (or maybe as a background worker?) handing
out notifications, so I don't see any point in spending my time on that.
.m
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Marko Tiikkaja <marko@joh.to> writes:
I've in the past wanted to listen on all notification channels in the
current database for debugging purposes. But recently I came across
another use case. Since having multiple postgres backends listening for
notifications is very inefficient (one Thursday I found out 30% of our
CPU time was spent spinning on s_locks around the notification code), it
makes sense to implement a notification server of sorts which then
passes on notifications from postgres to interested clients.
... and then you gotta get the notifications to the clients, so it seems
like this just leaves the performance question hanging. Why don't we find
and fix the actual performance issue (assuming that 07e4d03fb wasn't it)?
The reason I'm not terribly enthused about this proposal is that some
implementations of LISTEN (for example, our pre-9.0 one) would be unable
to support LISTEN *. It's not too hard to imagine that at some point we
might wish to redo the existing implementation to reduce the overhead of
all listeners seeing all messages, and then having promised we could do
LISTEN * would be a serious restriction on our flexibility. So while
I'm not necessarily trying to veto the idea, I think it has significant
opportunity cost, and I'd like to see a more solid rationale than this
one before we commit to it.
In any case, it would be good to understand exactly what's the performance
issue that's biting you. Can you provide a test case that reproduces
that behavior?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/19/15 4:21 PM, Tom Lane wrote:
Marko Tiikkaja <marko@joh.to> writes:
I've in the past wanted to listen on all notification channels in the
current database for debugging purposes. But recently I came across
another use case. Since having multiple postgres backends listening for
notifications is very inefficient (one Thursday I found out 30% of our
CPU time was spent spinning on s_locks around the notification code), it
makes sense to implement a notification server of sorts which then
passes on notifications from postgres to interested clients.... and then you gotta get the notifications to the clients, so it seems
like this just leaves the performance question hanging.
I'm not sure which performance question you think is left hanging. If
every client is connected to postgres, you're waking up tens if not
hundreds of processes tens if not hundreds of times per second. Each of
them start a transaction, check which notifications are visible in the
queue from clog and friends, go through the tails of every other process
to see whether they should advance the tail pointer of the queue, commit
the transaction and go back to sleep only to be immediately woken up again.
If they're not connected to postgres directly, this kind of complex
processing only happens once, and then the notification server just
unconditionally serves all notifications to the clients based on a
simple map lookup. It should be trivial to see how the overhead is avoided.
Why don't we find
and fix the actual performance issue (assuming that 07e4d03fb wasn't it)?
07e4d03fb wasn't it, no.
The reason I'm not terribly enthused about this proposal is that some
implementations of LISTEN (for example, our pre-9.0 one) would be unable
to support LISTEN *. It's not too hard to imagine that at some point we
might wish to redo the existing implementation to reduce the overhead of
all listeners seeing all messages, and then having promised we could do
LISTEN * would be a serious restriction on our flexibility. So while
I'm not necessarily trying to veto the idea, I think it has significant
opportunity cost, and I'd like to see a more solid rationale than this
one before we commit to it.
A reasonable point.
In any case, it would be good to understand exactly what's the performance
issue that's biting you. Can you provide a test case that reproduces
that behavior?
I've attached a Go program which simulates quite accurately the
LISTEN/NOTIFY-part of our setup. What it does is:
1) Open 50 connections, and issue a LISTEN in all of them
2) Open another 50 connections, and deliver one notification every
750 milliseconds from each of them
(My apologies for the fact that it's written in Go. It's the only thing
I can produce without spending significant amount of time working on this.)
On the test server I'm running on, it doesn't look quite as bad as the
profiles we had in production, but s_lock is still the worst offender in
the profiles, called from:
- 80.33% LWLockAcquire
+ 48.34% asyncQueueReadAllNotifications
+ 23.09% SIGetDataEntries
+ 16.92% SimpleLruReadPage_ReadOnly
+ 10.21% TransactionIdIsInProgress
+ 1.27% asyncQueueAdvanceTail
which roughly looks like what I recall from our actual production profiles.
.m
Attachments:
Marko Tiikkaja <marko@joh.to> writes:
On 11/19/15 4:21 PM, Tom Lane wrote:
... and then you gotta get the notifications to the clients, so it seems
like this just leaves the performance question hanging.
I'm not sure which performance question you think is left hanging. If
every client is connected to postgres, you're waking up tens if not
hundreds of processes tens if not hundreds of times per second. Each of
them start a transaction, check which notifications are visible in the
queue from clog and friends, go through the tails of every other process
to see whether they should advance the tail pointer of the queue, commit
the transaction and go back to sleep only to be immediately woken up again.
If they're not connected to postgres directly, this kind of complex
processing only happens once, and then the notification server just
unconditionally serves all notifications to the clients based on a
simple map lookup. It should be trivial to see how the overhead is avoided.
Meh. You've gotten rid of one single-point-of-contention by creating
another one. Essentially, your argument why this is an improvement is
that any random application developer using Postgres is smarter than
the Postgres development team and can therefore produce something
better-performing off the cuff. Which may indeed be true, but shall we
say it's unproven?
I don't buy the argument about removing transactional overhead, either.
The point of LISTEN/NOTIFY is to inform clients that something has
happened in the database that they ought to react to. So necessarily,
there's going to be follow-on database activity. If you're using
LISTEN/NOTIFY to wake up clients who otherwise wouldn't need a DB
connection at all, then you chose the wrong signaling mechanism.
In any case, it would be good to understand exactly what's the performance
issue that's biting you. Can you provide a test case that reproduces
that behavior?
I've attached a Go program which simulates quite accurately the
LISTEN/NOTIFY-part of our setup. What it does is:
Thanks, I'll take a look later.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Marko Tiikkaja wrote:
On the test server I'm running on, it doesn't look quite as bad as the
profiles we had in production, but s_lock is still the worst offender in the
profiles, called from:- 80.33% LWLockAcquire + 48.34% asyncQueueReadAllNotifications + 23.09% SIGetDataEntries + 16.92% SimpleLruReadPage_ReadOnly + 10.21% TransactionIdIsInProgress + 1.27% asyncQueueAdvanceTailwhich roughly looks like what I recall from our actual production profiles.
So the problem is in the bad scalability of LWLock, not in async.c itself?
In master, the spinlock there has been replaced with atomics; does that branch
work better?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/19/15 5:32 PM, Tom Lane wrote:
Marko Tiikkaja <marko@joh.to> writes:
On 11/19/15 4:21 PM, Tom Lane wrote:
... and then you gotta get the notifications to the clients, so it seems
like this just leaves the performance question hanging.I'm not sure which performance question you think is left hanging. If
every client is connected to postgres, you're waking up tens if not
hundreds of processes tens if not hundreds of times per second. Each of
them start a transaction, check which notifications are visible in the
queue from clog and friends, go through the tails of every other process
to see whether they should advance the tail pointer of the queue, commit
the transaction and go back to sleep only to be immediately woken up again.If they're not connected to postgres directly, this kind of complex
processing only happens once, and then the notification server just
unconditionally serves all notifications to the clients based on a
simple map lookup. It should be trivial to see how the overhead is avoided.Meh. You've gotten rid of one single-point-of-contention by creating
another one. Essentially, your argument why this is an improvement is
that any random application developer using Postgres is smarter than
the Postgres development team and can therefore produce something
better-performing off the cuff.
It's not about who's smarter and who's not. All a notification server
has to do is wait for notifications incoming from one socket and make
sure they're delivered to the correct sockets in an epoll loop. There's
only one process being woken up, no need to synchronize with other
processes, no need to see where the pointers of other processes are, no
overhead of transaction visibility, transaction BEGIN, or transaction
COMMIT. The total amount of work done is trivially smaller.
Which may indeed be true, but shall we
say it's unproven?
Well it's not proof, but every time we moved an application from
LISTENing in postgres to the notification server our CPU usage halved.
The last time we did that (from ~100 connections to exactly 1), s_lock
went down from 30% to 0.16% in our CPU profiles, and our CPU usage is
only a fraction of what it used to be. And this is with the
notification server running on the same server with postgres, so it's
not cheating, either.
There's no way we could keep running postgres if all 400+ clients
interested in notifications had a LISTEN open in the database.
.m
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Marko Tiikkaja wrote:
On the test server I'm running on, it doesn't look quite as bad as the
profiles we had in production, but s_lock is still the worst offender in the
profiles, called from:- 80.33% LWLockAcquire + 48.34% asyncQueueReadAllNotifications + 23.09% SIGetDataEntries + 16.92% SimpleLruReadPage_ReadOnly + 10.21% TransactionIdIsInProgress + 1.27% asyncQueueAdvanceTailwhich roughly looks like what I recall from our actual production profiles.
So the problem is in the bad scalability of LWLock, not in async.c itself?
In master, the spinlock there has been replaced with atomics; does that branch
work better?
FWIW, I can't get results anything like this. With 50 notifiers and 50
listeners, I only see LWLockAcquire eating about 6% of the system-wide
runtime:
- 6.23% 0.64% 11850 postmaster postgres [.] LWLockAcquire
- LWLockAcquire
+ 33.16% asyncQueueReadAllNotifications
+ 20.76% SimpleLruReadPage_ReadOnly
+ 18.02% TransactionIdIsInProgress
+ 8.67% VirtualXactLockTableInsert
+ 3.98% ProcArrayEndTransaction
+ 3.03% VirtualXactLockTableCleanup
+ 1.77% PreCommit_Notify
+ 1.68% ProcessCompletedNotifies
+ 1.62% asyncQueueAdvanceTail
+ 1.28% ProcessNotifyInterrupt
+ 1.04% StartTransaction
+ 1.00% LockAcquireExtended
+ 0.96% ProcSleep
+ 0.81% LockReleaseAll
+ 0.69% TransactionIdSetPageStatus
+ 0.54% GetNewTransactionId
There isn't any really sore-thumb place that we might fix. Interestingly,
a big chunk of the runtime is getting eaten by the kernel, breaking down
more or less like this:
- 36.52% 0.05% 836 postmaster [kernel.kallsyms] [k] system_call_fastpath
- system_call_fastpath
+ 37.47% __GI___kill
+ 27.26% __poll
+ 9.55% __write_nocancel
+ 8.87% __GI___semop
+ 7.31% __read_nocancel
+ 5.14% __libc_recv
+ 3.77% __libc_send
+ 0.53% __GI___setitimer
The kill() calls are evidently from async.c's SignalBackends() while most
of the other kernel activity is connected to client/server I/O.
Since the clients aren't doing any actual database work, just notify and
receive notifies, this doesn't seem like a representative workload.
So I can't get that excited about how asyncQueueReadAllNotifications and
subsidiary TransactionIdIsInProgress tests cause a large fraction of the
LWLock acquisitions --- that's only true because nothing else very useful
is happening.
BTW, this is HEAD, but I tried 9.4 quickly and didn't really see anything
much different. Not sure why the discrepancy with Marko's results.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers