Listen / Notify - what to do when the queue is full
We still need to decide what to do with queue full situations in the proposed
listen/notify implementation. I have a new version of the patch to allow for a
variable payload size. However, the whole notification must fit into one page so
the payload needs to be less than 8K.
I have also added the XID, so that we can write to the queue before committing
to clog which allows for rollback if we encounter write errors (disk full for
example). Especially the implications of this change make the patch a lot more
complicated.
The queue is slru-based, slru uses int page numbers, so we can use up to
2147483647 (INT_MAX) pages with some small changes in slru.c.
When do we have a full queue? Well, the idea is that notifications are written
to the queue and that they are read as soon as the notifying transaction
commits. Only if a listening backend is busy, it won't read the
notifications and
so it won't update its pointer for some time. With the current space we can
acommodate at least 2147483647 notifications or more, depending on the
payload length. That gives us something in between of 214 GB (100 Bytes per
notification) and 17 TB (8000 Bytes per notification). So in order to have a
full queue, we need to generate that amount of notifications while one backend
is still busy and is not reading the accumulating notifications. In general
chances are not too high that anyone will ever have a full notification queue,
but we need to define the behavior anyway...
These are the solutions that I currently see:
1) drop new notifications if the queue is full (silently or with rollback)
2) block until readers catch up (what if the backend that tries to write the
notifications actually is the "lazy" reader that everybody is waiting for to
proceed?)
3) invent a new signal reason and send SIGUSR1 to the "lazy" readers, they
need to interrupt whatever they are doing and copy the
notifications into their
own address space (without delivering the notifications since they are in a
transaction at that moment).
For 1) there can be warnings way ahead of when the queue is actually full, like
one when it is 50% full, another one when it is 75% full and so on and
they could
point to the backend that is most behind in reading notifications...
I think that 2) is the least practical approach. If there is a pile of at least
2,147,483,647 notifications, then a backend hasn't read the notifications
for a long long time... Chances are low that it will read them within the next
few seconds.
In a sense 2) implies 3) for the special case that the writing backend is
the one that everybody is waiting for to proceed reading notifications,
in the end this backend is waiting for itself.
For 3) the question is if we can just invent a new signal reason
PROCSIG_NOTIFYCOPY_INTERRUPT or similar and upon reception the backend
copies the notification data to its private address space?
Would this function be called by every backend after at most a few seconds
even if it is processing a long running query?
Admittedly, once 3) is in place we can also put a smaller queue into
shared memory
and remove the slru thing alltogether but we need to be sure that we can
interrupt the backends at any time since the queue size would be a lot smaller
than 200 GB...
Joachim
On Mon, Nov 16, 2009 at 9:05 AM, Greg Sabino Mullane >> We still need
to decide what to do with queue full situations in
the proposed listen/notify implementation. I have a new version
of the patch to allow for a variable payload size. However, the
whole notification must fit into one page so the payload needs
to be less than 8K.That sounds fine to me, FWIW.
+1! I think this should satisfy everyone.
I have also added the XID, so that we can write to the queue before
committing to clog which allows for rollback if we encounter write
errors (disk full for example). Especially the implications of this
change make the patch a lot more complicated.Can you elaborate on the use case for this?
Tom specifically asked for it: "The old implementation was acid so the
new one should be to"
so it won't update its pointer for some time. With the current space we can
acommodate at least 2147483647 notifications or more, depending on the
payload length.That's a whole lot of notifications. I doubt any program out there is using
anywhere near that number at the moment. In my applications, having a
few hundred notifications active at one time is "a lot" in my book. :)These are the solutions that I currently see:
1) drop new notifications if the queue is full (silently or with rollback)
I like this one best, but not with silence of course. While it's not the most
polite thing to do, this is for a super extreme edge case. I'd rather just
throw an exception if the queue is full rather than start messing with the
readers. It's a possible denial of service attack too, but so is the current
implementation in a way - at least I don't think apps would perform very
optimally with 2147483647 entries in the pg_listener table :)If you need some real-world use cases involving payloads, let me know, I've
been waiting for this feature for some time and have it all mapped out.
me too. Joachim: when I benchmarked the original patch, I was seeing
a few log messages that suggested there might be something going
inside. In any event, the performance was fantastic.
merlin
Import Notes
Reply to msg id not found: 204534911e82c0b6c7a2c07e743d7847@biglumber.com
Greg Sabino Mullane wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160We still need to decide what to do with queue full situations in
the proposed listen/notify implementation. I have a new version
of the patch to allow for a variable payload size. However, the
whole notification must fit into one page so the payload needs
to be less than 8K.That sounds fine to me, FWIW.
Agreed. Thank you for all your work.
1) drop new notifications if the queue is full (silently or with rollback)
I like this one best, but not with silence of course. While it's not the most
polite thing to do, this is for a super extreme edge case. I'd rather just
throw an exception if the queue is full rather than start messing with the
+1
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Import Notes
Reply to msg id not found: 204534911e82c0b6c7a2c07e743d7847@biglumber.comReference msg id not found: 204534911e82c0b6c7a2c07e743d7847@biglumber.com | Resolved by subject fallback
On Sun, Nov 15, 2009 at 7:19 PM, Joachim Wieland <joe@mcknight.de> wrote:
These are the solutions that I currently see:
1) [...]
2) [...]
3) [...]
4) Allow readers to read uncommitted notifications as well. Instead of
delivering them, the backends just copy them over into their own
address space and deliver them later on...
Going with option 4) allows readers to always read all notifications
in the queue... This also allows a backend to send more notifications
than the queue can hold. So we are only limited by the backends'
memory. Every notification that is sent will eventually be delivered.
The queue can still fill up if one of the backends is busy for a long
long long time... Then the next writer just blocks and waits.
Attached patch implements this behavior as well as a variable payload
size, limited to 8000 characters. The variable payload also offers an
automatic speed control... The smaller your notifications are, the
more efficiently a page can be used and the faster you are. :-)
Once we are fine that this is the way to go, I'll submit a documentation patch.
Joachim
Attachments:
listennotify.2.difftext/x-diff; charset=US-ASCII; name=listennotify.2.diffDownload+1083-394
Joachim Wieland <joe@mcknight.de> writes:
4) Allow readers to read uncommitted notifications as well.
The question that strikes me here is one of timing --- apparently,
readers will now have to check the queue *without* having received
a signal? That could amount to an unpleasant amount of extra overhead
when the notify system isn't even in use. (Users who don't care about
notify will define "unpleasant amount" as "not zero".)
I haven't read the patch, so maybe you have some cute solution to that,
but if so please explain what.
regards, tom lane
On Mon, Nov 16, 2009 at 2:35 PM, Andrew Chernow <ac@esilo.com> wrote:
1) drop new notifications if the queue is full (silently or with
rollback)I like this one best, but not with silence of course. While it's not the
most
polite thing to do, this is for a super extreme edge case. I'd rather just
throw an exception if the queue is full rather than start messing with the+1
So if you guys are going to insist on turning the notification
mechanism isn't a queueing mechanism I think it at least behooves you
to have it degrade gracefully into a notification mechanism and not
become entirely useless by dropping notification messages.
That is, if the queue overflows what you should do is drop the
payloads and condense all the messages for a given class into a single
notification for that class with "unknown payload". That way if a
cache which wants to invalidate specific objects gets a queue overflow
condition then at least it knows it should rescan the original data
and rebuild the cache and not just serve invalid data.
I still think you're on the wrong path entirely and will end up with a
mechanism which serves neither use case very well instead of two
separate mechanisms that are properly designed for the two use cases.
--
greg
On Thu, Nov 19, 2009 at 1:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Joachim Wieland <joe@mcknight.de> writes:
4) Allow readers to read uncommitted notifications as well.
The question that strikes me here is one of timing --- apparently,
readers will now have to check the queue *without* having received
a signal? That could amount to an unpleasant amount of extra overhead
when the notify system isn't even in use. (Users who don't care about
notify will define "unpleasant amount" as "not zero".)
The sequence in CommitTransaction() is like that:
1) add notifications to queue
2) commit to clog
3) signal backends
Only those backends are signalled that listen to at least one channel,
if the notify system isn't in use, then nobody will ever be signalled
anyway.
If a backend is reading a transaction id that has not yet committed,
it will not deliver the notification. It knows that eventually it will
receive a signal from that transaction and then it first checks its
list of uncommitted notifications it has already read and then checks
the queue for more pending notifications.
Joachim
That is, if the queue overflows what you should do is drop the
payloads and condense all the messages for a given class into a single
notification for that class with "unknown payload". That way if a
cache which wants to invalidate specific objects gets a queue overflow
condition then at least it knows it should rescan the original data
and rebuild the cache and not just serve invalid data.
That's far more complicated than throwing an error and it discards user payload
information. Let the error indicate a rescan is needed.
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
On 11/16/09 3:19 AM, Joachim Wieland wrote:
1) drop new notifications if the queue is full (silently or with rollback)
2) block until readers catch up (what if the backend that tries to write the
notifications actually is the "lazy" reader that everybody is waiting for to
proceed?)
3) invent a new signal reason and send SIGUSR1 to the "lazy" readers, they
need to interrupt whatever they are doing and copy the
notifications into their
own address space (without delivering the notifications since they are in a
transaction at that moment).
(4) drop *old* notifications if the queue is full.
Since everyone has made the point that LISTEN is not meant to be a full
queueing system, I have no problem dropping notifications LRU-style. If
we've run out of room, the oldest notifications should go first; we
probably don't care about them anyway.
We should probably also log the fact that we ran out of room, so that
the DBA knows that they ahve a design issue. For volume reasons, I
don't think we want to log every dropped message.
Alternately, it would be great to have a configuration option which
would allow the DBA to choose any of 3 behaviors via GUC:
drop-oldest (as above)
drop-largest (if we run out of room, drop the largest payloads first to
save space)
error (if we run out of room, error and rollback)
--Josh Berkus
We should probably also log the fact that we ran out of room, so that
the DBA knows that they ahve a design issue.
Can't they just bump allowed memory and avoid a redesign?
Alternately, it would be great to have a configuration option which
would allow the DBA to choose any of 3 behaviors via GUC:drop-oldest (as above)
drop-largest (if we run out of room, drop the largest payloads first to
save space)
error (if we run out of room, error and rollback)
I mentioned this up thread. I completely agree that overflow behavior should be
tunable.
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Joachim Wieland <joe@mcknight.de> writes:
The sequence in CommitTransaction() is like that:
1) add notifications to queue
2) commit to clog
3) signal backends
Only those backends are signalled that listen to at least one channel,
if the notify system isn't in use, then nobody will ever be signalled
anyway.
If a backend is reading a transaction id that has not yet committed,
it will not deliver the notification.
But you were saying that this patch would enable sending more data than
would fit in the queue. How will that happen if the other backends
don't look at the queue until you signal them?
regards, tom lane
Josh Berkus <josh@agliodbs.com> writes:
(4) drop *old* notifications if the queue is full.
Since everyone has made the point that LISTEN is not meant to be a full
queueing system, I have no problem dropping notifications LRU-style.
NO, NO, NO, a thousand times no!
That turns NOTIFY into an unreliable signaling system, and if I haven't
made this perfectly clear yet, any such change will be committed over my
dead body.
If we are unable to insert a new message into the queue, the correct
recourse is to fail the transaction that is trying to insert the *new*
message. Not to drop messages from already-committed transactions.
Failing the current transaction still leaves things in a consistent
state, ie, you don't get messages from aborted transactions but that's
okay because they didn't change the database state.
I think Greg has a legitimate concern about whether this redesign
reduces the usefulness of NOTIFY for existing use-cases, though.
Formerly, since pg_listener would effectively coalesce notifies
across multiple sending transactions instead of only one, it was
impossible to "overflow the queue", unless maybe you managed to
bloat pg_listener to the point of being out of disk space, and
even that was pretty hard. There will now be a nonzero chance
of transactions failing at commit because of queue full. If the
chance is large this will be an issue. (Is it sane to wait for
the queue to be drained?)
BTW, did we discuss the issue of 2PC transactions versus notify?
The current behavior of 2PC with notify is pretty cheesy and will
become more so if we make this change --- you aren't really
guaranteed that the notify will happen, even though the prepared
transaction did commit. I think it might be better to disallow
NOTIFY inside a prepared xact.
regards, tom lane
Andrew Chernow <ac@esilo.com> writes:
I mentioned this up thread. I completely agree that overflow behavior should be
tunable.
There is only one correct overflow behavior.
regards, tom lane
Tom Lane wrote:
Andrew Chernow <ac@esilo.com> writes:
I mentioned this up thread. I completely agree that overflow behavior should be
tunable.There is only one correct overflow behavior.
I count three.
1. wait
2. error
3. skip
#1 and #2 are very similar to a file system. If FS buffers are full on write,
it makes you wait. In non-blocking mode, it throws an EAGAIN error. IMHO those
two behaviors are totally acceptable for handling notify overflow. #3 is pretty
weak but I *think* there are uses for it.
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes:
Tom Lane wrote:
There is only one correct overflow behavior.
I count three.
Waiting till you can insert is reasonable (especially if we have some
behavior that nudges other backends to empty the queue). If by "skip"
you mean losing the notify but still committing, that's incorrect.
There is no room for debate about that.
regards, tom lane
Tom Lane wrote:
Andrew Chernow <ac@esilo.com> writes:
Tom Lane wrote:
There is only one correct overflow behavior.
I count three.
Waiting till you can insert is reasonable (especially if we have some
behavior that nudges other backends to empty the queue). If by "skip"
you mean losing the notify but still committing, that's incorrect.
There is no room for debate about that.
Yeah like I said, skip felt weak.
In regards to waiting, what would happen if other backends couldn't help empty
the queue because they to are clogged? ISTM that any attempt to flush to other
non-disk queues is doomed to possible overflows as well. Then what?
Personally, I would just wait until room became available or the transaction was
canceled. We could get fancy and tack a timeout value onto the wait.
--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/
Andrew Chernow <ac@esilo.com> writes:
Personally, I would just wait until room became available or the transaction was
canceled.
Works for me, as long as there's a CHECK_FOR_INTERRUPTS in there to
allow a cancel to happen. The current patch seems to have a lot of
pointless logging and no CHECK_FOR_INTERRUPTS ;-)
regards, tom lane
On Wed, 18 Nov 2009 22:12:18 -0500 Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
(4) drop *old* notifications if the queue is full.
Since everyone has made the point that LISTEN is not meant to be a full
queueing system, I have no problem dropping notifications LRU-style.NO, NO, NO, a thousand times no!
That turns NOTIFY into an unreliable signaling system, and if I haven't
made this perfectly clear yet, any such change will be committed over my
dead body.If we are unable to insert a new message into the queue, the correct
recourse is to fail the transaction that is trying to insert the *new*
message. Not to drop messages from already-committed transactions.
Failing the current transaction still leaves things in a consistent
state, ie, you don't get messages from aborted transactions but that's
okay because they didn't change the database state.
+1
And in addition i don't like the idea of having the sender sitting
around until there's room for more messages in the queue, because some
very old backends didn't remove the stuff from the same.
So, yes, just failing the current transaction seems reasonable. We are
talking about millions of messages in the queue ...
Bye
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Thu, Nov 19, 2009 at 4:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There will now be a nonzero chance
of transactions failing at commit because of queue full. If the
chance is large this will be an issue. (Is it sane to wait for
the queue to be drained?)
Exactly. The whole idea of putting the notification system to an slru queue
was to make this nonzero chance a very-close-to-zero nonzero chance.
Currently with pages from 0..0xFFFF we can have something between
160,000,000 (no payload) and 2,000,000 (biggest payload) notifications
in the queue at the same time.
We are free to remove the slru limitation by making slru.c work with 8
character file names. Then you can multiply both limits by 32,000 and
then it should be very-close-to-zero, at least in my point of view...
The actual queue-full behavior is then (or maybe is already now) just
a theoretical aspect that we need to agree on to make the whole
concept sound.
The current patch would just wait until some space becomes available
in the queue and it guarantees that no notification is lost. Furthermore
it guarantees that a transaction can listen on an unlimited number of
channels and that it can send an unlimited number of notifications,
not related to the size of the queue. It can also send that unlimited
number of notifications if it is one of the listeners of those notifications.
The only real limit is now the backend's memory but as long as nobody
proves that he needs unlimited notifications with a limited amount of
memory we just keep it like that.
I will add a CHECK_FOR_INTERRUPTS() and resubmit so that you
can cancel a NOTIFY while the queue is full. Also I've put in an
optimization to only signal those backends in a queue full situation
that are not yet up-to-date (which will probably turn out to be only one
backend - the slowest that is in a long running transaction - after some
time...).
BTW, did we discuss the issue of 2PC transactions versus notify?
The current behavior of 2PC with notify is pretty cheesy and will
become more so if we make this change --- you aren't really
guaranteed that the notify will happen, even though the prepared
transaction did commit. I think it might be better to disallow
NOTIFY inside a prepared xact.
Yes, I have been thinking about that also. So what should happen
when you prepare a transaction that has sent a NOTIFY before?
Joachim
On Thu, Nov 19, 2009 at 1:51 PM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de> wrote:
And in addition i don't like the idea of having the sender sitting
around until there's room for more messages in the queue, because some
very old backends didn't remove the stuff from the same.
The only valid reason why a backend has not processed the
notifications in the queue
must be a backend that is still in a transaction since then (and has
executed LISTEN
some time before).
Joachim