Toast issues with OldestXmin going backwards
Various comments in GetOldestXmin mention the possibility of the oldest
xmin going backward, and assert that this is actually safe. It's not.
Consider:
A table has a toastable column. A row is updated in a way that changes
the toasted value. There are now two row versions pointing to different
toast values, one live and one dead.
Now suppose the toast table - but not the main table - is vacuumed; the
dead toast entries are removed, even though they are still referenced by
the dead main-table row. Autovacuum treats the main table and toast
table separately, so this can happen.
Now suppose that OldestXmin goes backwards so that the older main table
row version is no longer dead, but merely recently-dead.
At this point, VACUUM FULL (or similar rewrites) on the table will fail
with "missing chunk number 0 for ..." toast errors, because it tries to
copy the recently-dead row, but that row's toasted values have been
vacuumed away already.
(I've been working for a while with someone on IRC to track down the
source of unexplained semi-repeatable "missing chunk number 0 ..."
errors from VACUUM FULL. I don't know at this stage whether this is the
actual problem, but it matches the symptoms.)
--
Andrew (irc:RhodiumToad)
On Thu, Apr 19, 2018 at 4:07 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
Various comments in GetOldestXmin mention the possibility of the oldest
xmin going backward, and assert that this is actually safe. It's not.Consider:
A table has a toastable column. A row is updated in a way that changes
the toasted value. There are now two row versions pointing to different
toast values, one live and one dead.Now suppose the toast table - but not the main table - is vacuumed; the
dead toast entries are removed, even though they are still referenced by
the dead main-table row. Autovacuum treats the main table and toast
table separately, so this can happen.Now suppose that OldestXmin goes backwards so that the older main table
row version is no longer dead, but merely recently-dead.At this point, VACUUM FULL (or similar rewrites) on the table will fail
with "missing chunk number 0 for ..." toast errors, because it tries to
copy the recently-dead row, but that row's toasted values have been
vacuumed away already.
I haven't tried to reproduce it, but I can see the possibility of the
problem described by you. What should we do next? I could see few
possibilities: (a) Vacuum for main and toast table should always use
same OldestXmin, (b) Before removing the row from toast table, we
should ensure that row in the main table is removed, (c) Change the
logic during rewrite such that it can detect this situation and skip
copying the tuple in the main table, on a quick look, this seems
tricky, because the toast table TID might have been reused by that
time, basically I am not sure if this can be a viable solution or (d)
Ensure that GetOldestXmin doesn't move backwards or write a new API
similar to it which doesn't allow OldestXmin to move backwards and use
that for the purpose of vacuum.
Any better ideas?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> I haven't tried to reproduce it, but I can see the possibility of
Amit> the problem described by you. What should we do next? I could see
Amit> few possibilities: (a) Vacuum for main and toast table should
Amit> always use same OldestXmin,
I don't see how this could be arranged without either disabling the
ability to vacuum the toast table independently, or storing the xmin
somewhere.
Amit> (b) Before removing the row from toast table, we should ensure
Amit> that row in the main table is removed,
We can't find the main table row version(s) without scanning the whole
main table, so this amounts (again) to disabling vacuum on the toast
table only.
Amit> (c) Change the logic during rewrite such that it can detect this
Amit> situation and skip copying the tuple in the main table,
This seems more promising, but the problem is how to detect the error
safely (since currently, it's just ereport()ed from deep inside the
toast code). How about:
1) add a toast_datum_exists() call or similar that returns false if the
(first block of) the specified external toast item is missing
2) when copying a RECENTLY_DEAD tuple, check all its external column
values using this function beforehand, and if any are missing, treat the
tuple as DEAD instead.
Amit> on a quick look, this seems tricky, because the toast table TID
Amit> might have been reused by that time,
Toast pointers don't point to TIDs fortunately, they point to OIDs. The
OID could have been reused (if there's been an OID wraparound since the
toast item was created), but that should be harmless: it means that we'd
incorrectly copy the new value with the old tuple, but the old tuple is
never going to be visible to anybody ever again so nothing will see that.
Amit> or (d) Ensure that GetOldestXmin doesn't move backwards or write
Amit> a new API similar to it which doesn't allow OldestXmin to move
Amit> backwards and use that for the purpose of vacuum.
This would presumably require storing a reference OldestXmin somewhere
(in shared memory? but there'd have to be a global one and one per db,
and I don't think we have any shared memory structures at present that
are per-db). We'd even have to preserve an oldestXmin for databases with
no currently active connections, though we wouldn't have to preserve it
across restarts.
Amit> Any better ideas?
It turns out this issue has come up before (5 years ago!):
/messages/by-id/20362.1359747327@sss.pgh.pa.us
--
Andrew (irc:RhodiumToad)
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> (c) Change the logic during rewrite such that it can detect this
Amit> situation and skip copying the tuple in the main table,
So I dug into this one and it looks to me like the best approach. Here's
a draft patch against 10-stable, if nobody can think of any showstoppers
then I'll do the rest of the versions and commit them.
--
Andrew (irc:RhodiumToad)
Attachments:
toastvac.patchtext/x-patchDownload+213-0
On Sat, 21 Apr 2018 at 6:46 PM, Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:
So I dug into this one and it looks to me like the best approach. Here's
a draft patch against 10-stable, if nobody can think of any showstoppers
then I'll do the rest of the versions and commit them.
How does this guard against the case when the same OID gets inserted in the
toast table again, with matching lengths etc? Rare but seems possible, no?
I think we should look for a more complete solution how hard these bugs are
to detect and fix.
Thanks,
Pavan
--
Andrew (irc:RhodiumToad)--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2018-04-21 14:16:27 +0100, Andrew Gierth wrote:
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> (c) Change the logic during rewrite such that it can detect this
Amit> situation and skip copying the tuple in the main table,So I dug into this one and it looks to me like the best approach. Here's
a draft patch against 10-stable, if nobody can think of any showstoppers
then I'll do the rest of the versions and commit them.
Please wait for a bit. This isn't a trivial change, and I would like to
wrap my head around it.
At the very least this seems like it could cause extreme slowdowns for
large tables that have the right update/delete patterns?
+ /* + * OldestXmin going backwards can mean we have a RECENTLY_DEAD row that + * contains toast pointers whose toast rows have already been vacuumed + * away (or in the worst but unlikely case, recycled). If so, then the + * row must really be dead to all snapshots that could access it, so + * treat it as DEAD instead. + */
How is it guaranteed that the recheck doesn't find a different toast
tuple at the relevant position?
Greetings,
Andres Freund
"Andres" == Andres Freund <andres@anarazel.de> writes:
Andres> Please wait for a bit. This isn't a trivial change, and I would
Andres> like to wrap my head around it.
Sure.
Andres> At the very least this seems like it could cause extreme
Andres> slowdowns for large tables that have the right update/delete
Andres> patterns?
I'm working on the basis that fetching the same set of toast rows twice
in succession will at least mean they are likely in cache for the second
time, so we shouldn't be doing any more actual I/O than before. (This is
assuming that a system with really big toast values is going to also
have plenty of RAM, which is a generally safe assumption since such
values are copied many times in memory by pg_dump.) There is still some
overhead in the check, of course; I will see about doing some
benchmarks.
+ /* + * OldestXmin going backwards can mean we have a RECENTLY_DEAD row that + * contains toast pointers whose toast rows have already been vacuumed + * away (or in the worst but unlikely case, recycled). If so, then the + * row must really be dead to all snapshots that could access it, so + * treat it as DEAD instead. + */
Andres> How is it guaranteed that the recheck doesn't find a different
Andres> toast tuple at the relevant position?
It's not, but it doesn't matter as long as the new toast rows can pass
through toast_fetch_datum without throwing an error (which is why
toast_validate_datum has to scan all the entries and check all their
lengths).
Firstly, we can assume the following statements (or at least if we
can't, we have far worse problems than this):
1. OldestXmin going backwards cannot make any tuple visible in any
(non-vacuum) snapshot that it wasn't visible in before;
2. If a tuple is non-vacuumable itself, it won't have its toast
entries vacuumed away;
3. Once a tuple is vacuumable at any single instant, it will never
become visible in any snapshot.
So the only cases where a recently-dead tuple can have pointers to a
now-recycled toast entry OID is if it was at some point vacuumable, such
that a toast-table-only vacuum removed its toast entries, and a new insert
or update subsequently inserted a value of the same length that was
assigned the OID.
Since the tuple was vacuumable at that point, it will never again become
visible in any snapshot, so the only code that should ever access its
toast values is vacuum full/cluster, which only ever copies the value
and does not attempt to decompress or otherwise interpret it. So the
only possible consequences for copying the wrong value, as long as we
don't throw an error in the process, are that (a) some space is possibly
wasted in the toast table - which actually happens already in far more
common circumstances - and (b) someone doing forensics or other
non-standard access to dead tuples might (in this vanishingly rare case)
see something unexpected. At no time would incorrect data be visible in
any query.
(regarding wasted space in the toast table: vacuum full / cluster will
insert live toast entries for copied recently-dead tuples, and if those
toast entries are not also referenced from any live tuple, they will
never be removed by subsequent normal vacuums, since nothing can ever
mark those entries dead; only a later vacuum full / cluster can remove
them.)
--
Andrew (irc:RhodiumToad)
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> There is still some overhead in the check, of course; I will
Andrew> see about doing some benchmarks.
Preliminary result suggests that the user-CPU cost of vacuum full
increases by ~16% when the entire table is recently-dead rows (with a
toasted column of ~10k length) compared to the same table when all rows
are live.
Since I doubt the practical value of vacuum full on a table which is
100% recently-dead, and that I would expect the live:recently-dead ratio
to not normally be much worse than 1:1 (making the overhead 8%) and more
likely up around 10:1 (making the overhead 1.5%), I think this is not an
issue. (When you have a lot of recently-dead rows is exactly the _wrong_
time to be doing a vacuum full, since it wouldn't save you any space and
you'd bloat the toast table as mentioned previously.)
--
Andrew (irc:RhodiumToad)
On Sat, Apr 21, 2018 at 1:26 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> I haven't tried to reproduce it, but I can see the possibility of
Amit> the problem described by you. What should we do next? I could see
Amit> few possibilities: (a) Vacuum for main and toast table should
Amit> always use same OldestXmin,I don't see how this could be arranged without either disabling the
ability to vacuum the toast table independently, or storing the xmin
somewhere.
I think we can use the same xmin for both main heap and toast by
computing it before scanning the main heap (lazy_vacuum_rel) and then
pass it down. I have tried it and came up with the attached patch.
Amit> (b) Before removing the row from toast table, we should ensure
Amit> that row in the main table is removed,We can't find the main table row version(s) without scanning the whole
main table, so this amounts (again) to disabling vacuum on the toast
table only.Amit> (c) Change the logic during rewrite such that it can detect this
Amit> situation and skip copying the tuple in the main table,This seems more promising, but the problem is how to detect the error
safely (since currently, it's just ereport()ed from deep inside the
toast code). How about:1) add a toast_datum_exists() call or similar that returns false if the
(first block of) the specified external toast item is missing2) when copying a RECENTLY_DEAD tuple, check all its external column
values using this function beforehand, and if any are missing, treat the
tuple as DEAD instead.Amit> on a quick look, this seems tricky, because the toast table TID
Amit> might have been reused by that time,Toast pointers don't point to TIDs fortunately, they point to OIDs. The
OID could have been reused (if there's been an OID wraparound since the
toast item was created), but that should be harmless: it means that we'd
incorrectly copy the new value with the old tuple, but the old tuple is
never going to be visible to anybody ever again so nothing will see that.
Yeah, that's right, but it gives some uneasy feeling that we are
attaching the wrong toast value. If you think there is some basic
flaw in Approach-1 (as in attached patch) or it requires some major
surgery then we can look further into this.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
use_same_oldestxmin_mainheap_toast_v1.patchapplication/octet-stream; name=use_same_oldestxmin_mainheap_toast_v1.patchDownload+29-11
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> I haven't tried to reproduce it, but I can see the possibility of
Amit> the problem described by you. What should we do next? I could see
Amit> few possibilities: (a) Vacuum for main and toast table should
Amit> always use same OldestXmin,
I don't see how this could be arranged without either disabling the
ability to vacuum the toast table independently, or storing the xmin
somewhere.
Amit> I think we can use the same xmin for both main heap and toast
But you're completely missing the point here, which is that the main
heap might not be vacuumed AT ALL. Autovacuum can and will call vacuum()
with the relid of a toast table - this is a deliberate decision to cope
with access patterns that would otherwise bloat the toast table.
(It's also currently legal to do VACUUM pg_toast.pg_toast_nnnn;
manually, but that's of secondary importance compared to the fact that
autovacuum does it.)
Amit> by computing it before scanning the main heap (lazy_vacuum_rel)
Amit> and then pass it down. I have tried it and came up with the
Amit> attached patch.
Your patch would actually be needed if (and only if) autovacuum was
changed back to its old behavior of never vacuuming toast tables
independently, and if manual VACUUM pg_toast.*; was disabled. But in the
presence of either of those two possibilities, it does nothing useful.
Toast pointers don't point to TIDs fortunately, they point to OIDs.
The OID could have been reused (if there's been an OID wraparound
since the toast item was created), but that should be harmless: it
means that we'd incorrectly copy the new value with the old tuple,
but the old tuple is never going to be visible to anybody ever again
so nothing will see that.
Amit> Yeah, that's right, but it gives some uneasy feeling that we are
Amit> attaching the wrong toast value. If you think there is some basic
Amit> flaw in Approach-1 (as in attached patch) or it requires some
Amit> major surgery then we can look further into this.
The basic flaw is that it doesn't even reach the problem.
--
Andrew (irc:RhodiumToad)
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> Yeah, that's right, but it gives some uneasy feeling that we are
Amit> attaching the wrong toast value.
Oh, forgot to stress this before: referencing the wrong toast value in
this case is something that can already happen, my patch does not affect
either the probability of it occurring or the consequences of it.
--
Andrew (irc:RhodiumToad)
"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> Since the tuple was vacuumable at that point, it will never
Andrew> again become visible in any snapshot, so the only code that
Andrew> should ever access its toast values is vacuum full/cluster,
and, unfortunately, CREATE INDEX (non-concurrently), including the index
build done by vacfull/cluster itself, which rather messes things up
because the toasted column might be a parameter to a functional index, or it
might itself be an indexed column (within a narrow size range for btree,
or more generally for other index types); this means the index AM or the
indexed function may try and actually detoast and inspect the value.
So right now it's possible (I think) in the worst case to crash the
server if you can set up exactly the right circumstances (re-use of a
prematurely vacuumed toast oid after wraparound, with the new value
matching the old value's raw length but not its compression state).
(My changes so far don't address this case and don't make it either more
or less likely)
--
Andrew (irc:RhodiumToad)
On Mon, Apr 23, 2018 at 1:55 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Amit> by computing it before scanning the main heap (lazy_vacuum_rel)
Amit> and then pass it down. I have tried it and came up with the
Amit> attached patch.Your patch would actually be needed if (and only if) autovacuum was
changed back to its old behavior of never vacuuming toast tables
independently, and if manual VACUUM pg_toast.*; was disabled. But in the
presence of either of those two possibilities, it does nothing useful.
Yeah, right, I have missed the point that they can be vacuumed
separately, however, I think that decision is somewhat questionable.
Basically, it doesn't appear logical to leave the sort-of dangling
toast pointer in heap. I think this is somewhat akin to our current
index and heap binding where we never let heap itemid to go off till
all index entries pointing to it are taken care of. I have gone
through the commit (3ccde312ec8ee47f5f797b070d34a675799448ae) which
appears to have decoupled this mechanism, but it doesn't provide any
reason for doing so. I have also gone through the related thread [1]/messages/by-id/20080808165809.GB3800@alvh.no-ip.org
which is not much help either. I think there must be some patterns
where toast table bloat causes much more harm than heap as toast rows
are much larger in size. Are you aware of what were the concrete
reasons behind this change? I think it would have been better if
along with decoupling of vacuum for main heap and toast tables, we
would have come up with a way to selectively remove the corresponding
rows from the main heap, say by just vacuuming heap pages/rows which
have toast pointers but maybe that is not viable or involves much more
work without equivalent benefit.
Also, we can think along the lines of another idea suggested by Andres
[2]: /messages/by-id/20130204164341.GB22226@awork2.anarazel.de
[1]: /messages/by-id/20080808165809.GB3800@alvh.no-ip.org
[2]: /messages/by-id/20130204164341.GB22226@awork2.anarazel.de
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Your patch would actually be needed if (and only if) autovacuum was
changed back to its old behavior of never vacuuming toast tables
independently, and if manual VACUUM pg_toast.*; was disabled. But in
the presence of either of those two possibilities, it does nothing
useful.
Amit> Yeah, right, I have missed the point that they can be vacuumed
Amit> separately, however, I think that decision is somewhat
Amit> questionable.
Some previous discussion links for reference, for the background to the
thread containing the patch:
/messages/by-id/87y7gpiqx3.fsf@oxford.xeocode.com
/messages/by-id/20080608230348.GD11028@alvh.no-ip.org
Amit> I think it would have been better if along with decoupling of
Amit> vacuum for main heap and toast tables, we would have come up with
Amit> a way to selectively remove the corresponding rows from the main
Amit> heap, say by just vacuuming heap pages/rows which have toast
Amit> pointers but maybe that is not viable or involves much more work
Amit> without equivalent benefit.
It should be fairly obvious why this is unworkable - most toast-using
tables will have toast pointers on every page, but without making a
whole new index of toast pointer OIDs (unacceptable overhead), it would
be impossible to find the toast pointers pointing to a specific item
without searching the whole rel (in which case we might just as well
have vacuumed it).
Amit> Also, we can think along the lines of another idea suggested by
Amit> Andres [2] on the thread mentioned by you.
That one is tricky for various reasons (locking, order of operations in
vacuum_rel, having to mess with the API of vacuum(), etc.)
--
Andrew (irc:RhodiumToad)
On Mon, Apr 23, 2018 at 1:34 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
"Amit" == Amit Kapila <amit.kapila16@gmail.com> writes:
Your patch would actually be needed if (and only if) autovacuum was
changed back to its old behavior of never vacuuming toast tables
independently, and if manual VACUUM pg_toast.*; was disabled. But in
the presence of either of those two possibilities, it does nothing
useful.Amit> Yeah, right, I have missed the point that they can be vacuumed
Amit> separately, however, I think that decision is somewhat
Amit> questionable.Some previous discussion links for reference, for the background to the
thread containing the patch:/messages/by-id/87y7gpiqx3.fsf@oxford.xeocode.com
/messages/by-id/20080608230348.GD11028@alvh.no-ip.org
If I read correctly, it seems one of the main reason [1]From one of the email: "We go certain lengths in autovacuum to make sure tables are vacuumed when their toast table needs vacuuming and the main table does not, which is all quite kludgy. So we already look at their stats and make decisions about them. But what we do after that is force a vacuum to the main table, even if that one does not need any vacuuming, which is dumb." is to save
the extra pass over the heap and improve the code. Now, ideally, the
extra pass over heap should also free up some space (occupied by the
rows that contains old toast pointers corresponding to which we are
going to remove rows from toast table), but it is quite possible that
it is already clean due to a previous separate vacuum pass over the
heap. I think it is good to save extra pass over heap which might not
be as useful as we expect, but that can cost us correctness issues in
boundary cases as in the case being discussed in this thread.
Amit> I think it would have been better if along with decoupling of
Amit> vacuum for main heap and toast tables, we would have come up with
Amit> a way to selectively remove the corresponding rows from the main
Amit> heap, say by just vacuuming heap pages/rows which have toast
Amit> pointers but maybe that is not viable or involves much more work
Amit> without equivalent benefit.It should be fairly obvious why this is unworkable - most toast-using
tables will have toast pointers on every page, but without making a
whole new index of toast pointer OIDs (unacceptable overhead), it would
be impossible to find the toast pointers pointing to a specific item
without searching the whole rel (in which case we might just as well
have vacuumed it).
Okay, such an optimization might not be much effective and it would
anyway lead to extra pass over the heap, however, that will resolve
the correctness issue. Now, I understand that it is not advisable to
go back to the previous behavior for performance concerns, but I think
it would be better if we find a bullet-proof way to fix this symptom,
rather than just fixing the issue reported in this thread.
[1]: From one of the email: "We go certain lengths in autovacuum to make sure tables are vacuumed when their toast table needs vacuuming and the main table does not, which is all quite kludgy. So we already look at their stats and make decisions about them. But what we do after that is force a vacuum to the main table, even if that one does not need any vacuuming, which is dumb."
make sure tables are vacuumed when their toast table needs vacuuming
and the main table does not, which is all quite kludgy. So we already
look at their stats and make decisions about them. But what we do
after that is force a vacuum to the main table, even if that one does
not need any vacuuming, which is dumb."
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 4/23/18 00:33, Amit Kapila wrote:
Yeah, right, I have missed the point that they can be vacuumed
separately, however, I think that decision is somewhat questionable.
Manually vacuuming the TOAST table was a way to repair the recently
fixed TOAST bug, so it's kind of useful.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 23, 2018 at 11:21 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
If I read correctly, it seems one of the main reason [1] is to save
the extra pass over the heap and improve the code. Now, ideally, the
extra pass over heap should also free up some space (occupied by the
rows that contains old toast pointers corresponding to which we are
going to remove rows from toast table), but it is quite possible that
it is already clean due to a previous separate vacuum pass over the
heap. I think it is good to save extra pass over heap which might not
be as useful as we expect, but that can cost us correctness issues in
boundary cases as in the case being discussed in this thread.
I don't think anybody disagrees with that, but it doesn't answer the
question of how best to fix it. Making VACUUM a lot more expensive
will not win us any awards, and an extra pass over the heap could be
very expensive. You seem to think Andrew's fix isn't really
addressing the root of the problem, but I think that just depends on
how you define the problem. If you define the problem as "the table
should never have dangling pointers to the TOAST table", then Andrew's
approach is only fixing the symptom. But if you define the problem as
"we shouldn't try to follow TOAST pointers in heap tuples that are not
visible to any current or future MVCC snapshot", then it's fixing the
root issue.
I wonder if perhaps get_actual_variable_range() has a hazard of this
type as well. If OldestXmin retreats, then the special snapshot type
which it uses could see a row whose TOAST entry has been removed
meanwhile.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I wonder if perhaps get_actual_variable_range() has a hazard of this
type as well. If OldestXmin retreats, then the special snapshot type
which it uses could see a row whose TOAST entry has been removed
meanwhile.
Hm, yeah. I wonder if we could fix that by extracting the value from
the index rather than recomputing it from the heap tuple, as we do now.
We'd still have to visit the heap tuple to check liveness, but we'd
not need to extract anything from its data part.
Have we given up on the angle of "prevent OldestXmin from retreating"?
That seems like it'd be far more bulletproof than trying to work
around places that break one-by-one.
regards, tom lane
On Thu, Apr 26, 2018 at 03:09:01PM -0400, Tom Lane wrote:
Have we given up on the angle of "prevent OldestXmin from retreating"?
That seems like it'd be far more bulletproof than trying to work
around places that break one-by-one.
Strong +1 on that.
--
Michael
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Have we given up on the angle of "prevent OldestXmin from
Tom> retreating"?
I haven't, and I'm thinking about it, but I don't have an immediate
solution.
Thinking-out-loud mode:
1) we could store minimum values for OldestXmin for each database in
some kind of shared-memory structure. Downside is that we can't
easily make this fixed-size, since we don't know how many DBs there
are - we can't size it based on connections since we need to be able
to track values for dbs with no currently active connections.
(Or do we need to track it across restarts? maybe we do, to deal with
replication slaves without slots, or changes in parameters)
2) in-place updates of a new column in pg_database? and a control file
field for the global values? not back-patchable, but would it work
going forward?
3) there was the previous suggestion to store it per-table in the main
table's reloptions and inplace-update that; this isn't impossible,
but it would be fiddly to do because toast-table-only vacuums would
need to locate the main table, and lock order issues might get
interesting. (Maybe it would be sneakier to store it in the toast
table's otherwise-unused reloptions, and have vacuum/create index on
the main table consult that? This assumes that we only care about the
issue when dealing with toast tables)
4) something else??
--
Andrew (irc:RhodiumToad)