Restricting maximum keep segments by repslots
Hello.
Although replication slot is helpful to avoid unwanted WAL
deletion, on the other hand it can cause a disastrous situation
by keeping WAL segments without a limit. Removing the causal
repslot will save this situation but it is not doable if the
standby is active. We should do a rather complex and forcible
steps to relieve the situation especially in an automatic
manner. (As for me, specifically in an HA cluster.)
This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.
WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.
Another measure would be automatic deletion or inactivation of
the culprit slot but it seems too complex for the problem.
As we have already postponed some patches by the triage for the
last commit fest, this might should be postponed to PG11.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Add-WAL-releaf-vent-for-replication-slots.patchtext/x-patch; charset=us-asciiDownload+24-1
On Tue, Feb 28, 2017 at 12:27 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Although replication slot is helpful to avoid unwanted WAL
deletion, on the other hand it can cause a disastrous situation
by keeping WAL segments without a limit. Removing the causal
repslot will save this situation but it is not doable if the
standby is active. We should do a rather complex and forcible
steps to relieve the situation especially in an automatic
manner. (As for me, specifically in an HA cluster.)This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.Another measure would be automatic deletion or inactivation of
the culprit slot but it seems too complex for the problem.As we have already postponed some patches by the triage for the
last commit fest, this might should be postponed to PG11.
Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion. If
a slot is bloating WAL and you care about your Postgres instance, I
would recommend instead that you use a background worker that does
monitoring of the situation based on max_wal_size for example, killing
the WAL sender associated to the slot if there is something connected
but it is frozen or it cannot keep up the pace of WAL generation, and
then dropping the slot. You may want to issue a checkpoint in this
case as well to ensure that segments get recycled. But anyway, if you
reach this point of WAL bloat, perhaps that's for the best as users
would know about it because backups would get in danger. For some
applications, that is acceptable, but you could always rely on
monitoring slots and kill them on sight if needed. That's as well more
flexible than having a parameter that basically is just a synonym of
max_wal_size.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for the opinion.
At Tue, 28 Feb 2017 12:42:32 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQm0QetoShggQnn4bLFd9oXKKHG7RafBP3Krno62=ORww@mail.gmail.com>
Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion. If
a slot is bloating WAL and you care about your Postgres instance, I
would recommend instead that you use a background worker that does
monitoring of the situation based on max_wal_size for example, killing
the WAL sender associated to the slot if there is something connected
but it is frozen or it cannot keep up the pace of WAL generation, and
then dropping the slot.
It is doable without a plugin and currently we are planning to do
in the way (Maybe such plugin would be unacceptable..). Killing
walsender (which one?), removing the slot and if failed.. This is
the 'steps rather complex' and fragile.
You may want to issue a checkpoint in this
case as well to ensure that segments get recycled. But anyway, if you
reach this point of WAL bloat, perhaps that's for the best as users
would know about it because backups would get in danger.
Yes, but at the end it is better than that a server just stops
with a PANIC.
For some applications, that is acceptable, but you could always
rely on monitoring slots and kill them on sight if
needed.
Another solution would be that removing a slot kills
corresponding walsender. What do you think about this?
pg_drop_replication_slot(name, *force*)
force = true kills the walsender runs on the slot.
That's as well more flexible than having a parameter
that basically is just a synonym of max_wal_size.
I thought the same thing first, max_wal_size_hard, that limits
the wal size including extra (other than them for the two
checkpoig cycles) segments.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28/02/17 04:27, Kyotaro HORIGUCHI wrote:
Hello.
Although replication slot is helpful to avoid unwanted WAL
deletion, on the other hand it can cause a disastrous situation
by keeping WAL segments without a limit. Removing the causal
repslot will save this situation but it is not doable if the
standby is active. We should do a rather complex and forcible
steps to relieve the situation especially in an automatic
manner. (As for me, specifically in an HA cluster.)
I agree that that it should be possible to limit how much WAL slot keeps.
This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.
However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 28, 2017 at 1:16 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
It is doable without a plugin and currently we are planning to do
in the way (Maybe such plugin would be unacceptable..). Killing
walsender (which one?), removing the slot and if failed..
The PID and restart_lsn associated to each slot offer enough
information for monitoring.
This is the 'steps rather complex' and fragile.
The handling of slot drop is not complex. The insurance that WAL
segments get recycled on time and avoid a full bloat is though.
That's as well more flexible than having a parameter
that basically is just a synonym of max_wal_size.I thought the same thing first, max_wal_size_hard, that limits
the wal size including extra (other than them for the two
checkpoig cycles) segments.
It would make more sense to just switch max_wal_size from a soft to a
hard limit. The current behavior is not cool with activity spikes.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 28, 2017 at 10:04 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
It would make more sense to just switch max_wal_size from a soft to a
hard limit. The current behavior is not cool with activity spikes.
Having a hard limit on WAL size would be nice, but that's a different
problem from the one being discussed here. If max_wal_size becomes a
hard limit, and a standby with a replication slot dies, then the
master eventually starts refusing all writes. I guess that's better
than a PANIC, but it's not likely to make users very happy. I think
it's entirely reasonable to want a behavior where the master is
willing to retain up to X amount of extra WAL for the benefit of some
standby, but after that the health of the master takes priority.
You can't really get that behavior today. Either you can retain as
much WAL as might be necessary through archiving or a slot, or you can
retain a fixed amount of WAL whether it's actually needed or not.
There's currently no approach that retains min(wal_needed,
configured_value).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-02-28 12:42:32 +0900, Michael Paquier wrote:
Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion.
I think a per-slot option to limit the amount of retention would make
sense.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 23:27, Petr Jelinek wrote:
WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.
In general, we would need a much more evident and strict way to discover
when this condition is hit. Like a "full" column in
pg_stat_replication_slot, and refusing connections to the slot until it
is cleared.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/27/17 22:27, Kyotaro HORIGUCHI wrote:
This patch adds a GUC to put a limit to the number of segments
that replication slots can keep.
Please measure it in size, not in number of segments.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 1 Mar 2017 08:06:10 -0800, Andres Freund <andres@anarazel.de> wrote in <20170301160610.wc7ez3vihmialntd@alap3.anarazel.de>
On 2017-02-28 12:42:32 +0900, Michael Paquier wrote:
Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion.I think a per-slot option to limit the amount of retention would make
sense.
I started from that but I found that all slots refer to the same
location as the origin of the distance, that is, the last segment
number that KeepLogSeg returns. As the result the whole logic
became as the following. This is one reason of the proposed pach.
- Take the maximum value of the maximum-retain-LSN-amount per slot.
- Apply the maximum value during the calcuation in KeepLogSeg.
- (These steps runs only when at least one slot exists)
The another reason was, as Robert retold, I thought that this is
a matter of system (or a DB cluster) wide health and works in a
bit different way from what the name "max_wal_size_hard"
suggests.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 1 Mar 2017 12:17:43 -0500, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <dc7faead-61c4-402e-a6dc-534192833d77@2ndquadrant.com>
On 2/27/17 23:27, Petr Jelinek wrote:
WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.In general, we would need a much more evident and strict way to discover
when this condition is hit. Like a "full" column in
pg_stat_replication_slot, and refusing connections to the slot until it
is cleared.
Anyway, if preserving WAL to replicate has priority to the
master's health, this doesn't nothing by leaving
'max_wal_keep_segments' to 0.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 1 Mar 2017 12:18:07 -0500, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <98538b00-42ae-6a6b-f852-50b3c937ade4@2ndquadrant.com>
On 2/27/17 22:27, Kyotaro HORIGUCHI wrote:
This patch adds a GUC to put a limit to the number of segments
that replication slots can keep.Please measure it in size, not in number of segments.
It was difficult to dicide which is reaaonable but I named it
after wal_keep_segments because it has the similar effect.
In bytes(or LSN)
max_wal_size
min_wal_size
wal_write_flush_after
In segments
wal_keep_segments
But surely max_slot_wal_keep_segments works to keep disk space so
bytes would be reasonable.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
Please measure it in size, not in number of segments.
It was difficult to dicide which is reaaonable but I named it
after wal_keep_segments because it has the similar effect.In bytes(or LSN)
max_wal_size
min_wal_size
wal_write_flush_afterIn segments
wal_keep_segments
We have been moving away from measuring in segments. For example,
checkpoint_segments was replaced by max_wal_size.
Also, with the proposed patch that allows changing the segment size more
easily, this will become more important. (I wonder if that will require
wal_keep_segments to change somehow.)
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for the comment.
At Fri, 3 Mar 2017 14:47:20 -0500, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <ac510b45-7805-7ccc-734c-1b38a6645f3e@2ndquadrant.com>
On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
Please measure it in size, not in number of segments.
It was difficult to dicide which is reaaonable but I named it
after wal_keep_segments because it has the similar effect.In bytes(or LSN)
max_wal_size
min_wal_size
wal_write_flush_afterIn segments
wal_keep_segmentsWe have been moving away from measuring in segments. For example,
checkpoint_segments was replaced by max_wal_size.Also, with the proposed patch that allows changing the segment size more
easily, this will become more important. (I wonder if that will require
wal_keep_segments to change somehow.)
Agreed. It is 'max_slot_wal_keep_size' in the new version.
wal_keep_segments might should be removed someday.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Add-WAL-releaf-vent-for-replication-slots_20170306.patchtext/x-patch; charset=us-asciiDownload+25-1
On 28 February 2017 at 12:27, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.WARNING: restart LSN of replication slots is ignored by checkpoint
DETAIL: Some replication slots lose required WAL segnents to continue.However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.
That skipping only happens if you request a startpoint older than
confirmed_flush_lsn . It doesn't apply to this situation.
The client cannot control where we start decoding, it's always
restart_lsn, and if we can't find a needed WAL segment we'll ERROR. So
this is safe, though the error will be something about being unable to
find a wal segment that users might not directly associate with having
set this option. It won't say "slot disabled because needed WAL has
been discarded due to [setting]" or anything.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
I'll add this to CF2017-09.
At Mon, 06 Mar 2017 18:20:06 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.182006.172683338.horiguchi.kyotaro@lab.ntt.co.jp>
Thank you for the comment.
At Fri, 3 Mar 2017 14:47:20 -0500, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <ac510b45-7805-7ccc-734c-1b38a6645f3e@2ndquadrant.com>
On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
Please measure it in size, not in number of segments.
It was difficult to dicide which is reaaonable but I named it
after wal_keep_segments because it has the similar effect.In bytes(or LSN)
max_wal_size
min_wal_size
wal_write_flush_afterIn segments
wal_keep_segmentsWe have been moving away from measuring in segments. For example,
checkpoint_segments was replaced by max_wal_size.Also, with the proposed patch that allows changing the segment size more
easily, this will become more important. (I wonder if that will require
wal_keep_segments to change somehow.)Agreed. It is 'max_slot_wal_keep_size' in the new version.
wal_keep_segments might should be removed someday.
- Following to min/max_wal_size, the variable was renamed to
"max_slot_wal_keep_size_mb" and used as ConvertToXSegs(x)"
- Stopped warning when checkpoint doesn't flush segments required
by slots even if max_slot_wal_keep_size have worked.
- Avoided subraction that may be negative.
regards,
Attachments:
slot_releaf_vent_v3.patchtext/x-patch; charset=us-asciiDownload+37-0
I'm still concerned about how the critical situation is handled. Your
patch just prints a warning to the log and then goes on -- doing what?
The warning rolls off the log, and then you have no idea what happened,
or how to recover.
I would like a flag in pg_replication_slots, and possibly also a
numerical column that indicates how far away from the critical point
each slot is. That would be great for a monitoring system.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Fri, 1 Sep 2017 23:49:21 -0400, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote in <751e09c4-93e0-de57-edd2-e64c4950f5e3@2ndquadrant.com>
I'm still concerned about how the critical situation is handled. Your
patch just prints a warning to the log and then goes on -- doing what?The warning rolls off the log, and then you have no idea what happened,
or how to recover.
The victims should be complaining in their log files, but, yes, I
must admit that it's extremely resembles /dev/null. And the
catastrophe comes suddenly.
I would like a flag in pg_replication_slots, and possibly also a
numerical column that indicates how far away from the critical point
each slot is. That would be great for a monitoring system.
Great! I'll do that right now.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170907.141212.227032666.horiguchi.kyotaro@lab.ntt.co.jp>
I would like a flag in pg_replication_slots, and possibly also a
numerical column that indicates how far away from the critical point
each slot is. That would be great for a monitoring system.Great! I'll do that right now.
Done.
In the attached patch on top of the previous patch, I added two
columns in pg_replication_slots, "live" and "distance". The first
indicates the slot will "live" after the next checkpoint. The
second shows the how many bytes checkpoint lsn can advance before
the slot will "die", or how many bytes the slot have lost after
"death".
Setting wal_keep_segments = 1 and max_slot_wal_keep_size = 16MB.
=# select slot_name, restart_lsn, pg_current_wal_lsn(), live, distance from pg_replication_slots;
slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/162D388 | 0/162D3C0 | t | 0/29D2CE8
This shows that checkpoint can advance 0x29d2ce8 bytes before the
slot will die even if the connection stalls.
s1 | 0/4001180 | 0/6FFF2B8 | t | 0/DB8
Just before the slot loses sync.
s1 | 0/4001180 | 0/70008A8 | f | 0/FFEE80
The checkpoint after this removes some required segments.
2017-09-07 19:04:07.677 JST [13720] WARNING: restart LSN of replication slots is ignored by checkpoint
2017-09-07 19:04:07.677 JST [13720] DETAIL: Some replication slots have lost required WAL segnents to continue by up to 1 segments.
If max_slot_wal_keep_size if not set (0), live is always true and
distance is NULL.
slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/4001180 | 0/73117A8 | t |
- The name (or its content) of the new columns should be arguable.
- pg_replication_slots view takes LWLock on ControlFile and
spinlock on XLogCtl for every slot. But seems difficult to
reduce it..
- distance seems mitakenly becomes 0/0 for certain condition..
- The result seems almost right but more precise check needed.
(Anyway it cannot be perfectly exact.);
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0002-Add-monitoring-aid-for-max_replication_slots.patchtext/x-patch; charset=us-asciiDownload+100-4
At Thu, 07 Sep 2017 21:59:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170907.215956.110216588.horiguchi.kyotaro@lab.ntt.co.jp>
Hello,
At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170907.141212.227032666.horiguchi.kyotaro@lab.ntt.co.jp>
I would like a flag in pg_replication_slots, and possibly also a
numerical column that indicates how far away from the critical point
each slot is. That would be great for a monitoring system.Great! I'll do that right now.
Done.
The CF status of this patch turned into "Waiting on Author".
This is because the second patch is posted separately from the
first patch. I repost them together after rebasing to the current
master.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center