docs: LISTEN/NOTIFY performance considerations
Greetings!
LISTEN/NOTIFY has known performance issues that aren't documented but
regularly surprise users in production – my customers and I encountered
some of them multiple times.
They were also discussed in the past, e.g.,:
- 2008: /messages/by-id/5215.1204048454@sss.pgh.pa.us
- 2013: /messages/by-id/3598.1363354686@sss.pgh.pa.us
Recently, Recall.ai had production outages from hitting these exact issues:
https://www.recall.ai/blog/postgres-listen-notify-does-not-scale (popped up
to no.1 position on HN right now:
https://news.ycombinator.com/item?id=44490510).
It's probably a good time to consider improving this area, but while it's
not happening, I propose documenting the risks to help users avoid
incidents (backpatching to all supported versions).
The proposed docs patch to the LISTEN/NOTIFY docs includes words about:
1. Global lock during commit affecting all databases in cluster
2. O(N²) duplicate checking performance
3. How to diagnose (log_lock_waits example)
4. Alternatives (logical decoding)
Thoughts?
Nik