Automatic free space map filling
Something came to my mind today, I'm not sure if it's feasible but I
would like to know opinions on it.
We've seen database applications that PostgreSQL simply could not manage
because one would have to vacuum continuously. Perhaps in those
situations one could arrange it that an update (or delete) of a row
registers the space in the free space map right away, on the assumption
that by the time it is up for reuse, the transaction will likely have
committed. Naturally, this would need to be secured in some way, for
example a "maybe" bit in the FSM itself or simply checking that the
supposed free space is really free before using it, perhaps combined
with a timeout ("don't consider until 5 seconds from now").
I think with applications that have a more or less constant data volume
but update that data a lot, this could assure constant disk space usage
(even if it's only a constant factor above the ideal usage) without any
vacuuming.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes:
We've seen database applications that PostgreSQL simply could not manage
because one would have to vacuum continuously. Perhaps in those
situations one could arrange it that an update (or delete) of a row
registers the space in the free space map right away, on the assumption
that by the time it is up for reuse, the transaction will likely have
committed.
The free-space map is not the hard part of the problem. You still have
to VACUUM --- that is, wait until the dead tuple is not only committed
dead but is certainly dead to all onlooker transactions, and then remove
its index entries as well as the tuple itself. The first part of this
makes it impossible for a transaction to be responsible for vacuuming
its own detritus.
Naturally, this would need to be secured in some way,
The FSM is only a hint anyway --- if it points someone to a page that in
reality does not have adequate free space, nothing bad happens except
for the wasted cycles to visit the page and find that out. See the loop
in RelationGetBufferForTuple().
regards, tom lane
Ühel kenal päeval, E, 2006-02-27 kell 19:20, kirjutas Peter Eisentraut:
Something came to my mind today, I'm not sure if it's feasible but I
would like to know opinions on it.We've seen database applications that PostgreSQL simply could not manage
because one would have to vacuum continuously.
What's wrong with vacuuminng continuously ?
I am running an application, that in fact does vacuum continuously
without any ill effects. A case when things become compliacted, is when
you have one huge table (say 50.000.000 rows) that is updated at a
moderate rate and needs an occasional vacuum + a fast-update table,
which needs continuous vacuum. Due to current implementation of vacuum,
you have to abandon continuous vacuuming during vacuum of bigtable, but
i have written and submitted to "patches" list a patch which allows
vacuums not to block each other out, this is stalled due to Tom's
"unesyness" about its possible hidden effects, but it should be
available from "patches" list to anyone in distress :p
Perhaps in those
situations one could arrange it that an update (or delete) of a row
registers the space in the free space map right away, on the assumption
that by the time it is up for reuse, the transaction will likely have
committed. Naturally, this would need to be secured in some way, for
example a "maybe" bit in the FSM itself or simply checking that the
supposed free space is really free before using it, perhaps combined
with a timeout ("don't consider until 5 seconds from now").
Unfortunately transactions have no knowledge about wallclock time :(
I think with applications that have a more or less constant data volume
----------------
Hannu
Hannu Krosing wrote:
Due to current implementation of vacuum,
you have to abandon continuous vacuuming during vacuum of bigtable, but
i have written and submitted to "patches" list a patch which allows
vacuums not to block each other out, this is stalled due to Tom's
"unesyness" about its possible hidden effects, but it should be
available from "patches" list to anyone in distress :p
Do you use it in production? Have you noticed any ill effects?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Am Montag, 27. Februar 2006 19:42 schrieb Tom Lane:
The free-space map is not the hard part of the problem. You still have
to VACUUM --- that is, wait until the dead tuple is not only committed
dead but is certainly dead to all onlooker transactions, and then remove
its index entries as well as the tuple itself. The first part of this
makes it impossible for a transaction to be responsible for vacuuming
its own detritus.
I'm not sure if I made myself clear. The idea is that you fill the free-space
map early with opportunitistic entries in the hope that most updates and
deletes go through "soon". That is, these entries will be invalid for a
short time but hopefully by the time another write looks at them, the entries
will have become valid. That way you don't actually have to run vacuum on
these deleted rows.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes:
I'm not sure if I made myself clear. The idea is that you fill the free-space
map early with opportunitistic entries in the hope that most updates and
deletes go through "soon". That is, these entries will be invalid for a
short time but hopefully by the time another write looks at them, the entries
will have become valid. That way you don't actually have to run vacuum on
these deleted rows.
How does an optimistic FSM entry avoid the need to run vacuum? All that
will happen is that some backend will visit the page and not find usable
free space.
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I'm not sure if I made myself clear. The idea is that you fill the free-space
map early with opportunitistic entries in the hope that most updates and
deletes go through "soon". That is, these entries will be invalid for a
short time but hopefully by the time another write looks at them, the entries
will have become valid. That way you don't actually have to run vacuum on
these deleted rows.How does an optimistic FSM entry avoid the need to run vacuum? All that
will happen is that some backend will visit the page and not find usable
free space.
Because the index isn't removed, right? That index thing is what
usually kills us.
--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com
+ If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote:
How does an optimistic FSM entry avoid the need to run vacuum?
It ensures that all freed tuples are already in the FSM.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
How does an optimistic FSM entry avoid the need to run vacuum?
It ensures that all freed tuples are already in the FSM.
That has nothing to do with it, because the space isn't actually free
for re-use until vacuum deletes the tuple.
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
How does an optimistic FSM entry avoid the need to run vacuum?
It ensures that all freed tuples are already in the FSM.
That has nothing to do with it, because the space isn't actually free
for re-use until vacuum deletes the tuple.
I think the idea is a different "free space map" of sorts, whereby a
transaction that obsoletes a tuple puts its block number in that map. A
transaction that inserts a new tuple goes to the FSM. If nothing is
found, it then goes to the new map. A block returned from that map is
then scanned and any tuple that's no longer visible for anyone is
reused.
The problem with this idea is scanning the block and for each tuple
determine if it's alive. Essentially, we would be folding the "find
dead tuples and compress page" logic, which is currently in vacuum, back
to insert. IMHO this is unacceptable from a performance PoV.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Ühel kenal päeval, T, 2006-02-28 kell 19:47, kirjutas Alvaro Herrera:
Hannu Krosing wrote:
Due to current implementation of vacuum,
you have to abandon continuous vacuuming during vacuum of bigtable, but
i have written and submitted to "patches" list a patch which allows
vacuums not to block each other out, this is stalled due to Tom's
"unesyness" about its possible hidden effects, but it should be
available from "patches" list to anyone in distress :pDo you use it in production? Have you noticed any ill effects?
No, I don't run it in production at this time, as I solved the immediate
problem by splitting small and big tables to different databases and
having client applications rewritten accordingly.
I did run a parallel load (queries from log of real database, plus
parallel vacuums on tables) for some time and saw no ill effects there.
I will likely start using it in production on some databases during next
few months as new restructuring of databases brings back the case where
huge and tiny tables are in the same database.
--------------
Hannu
On Wed, Mar 01, 2006 at 12:41:01PM -0500, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
How does an optimistic FSM entry avoid the need to run vacuum?
It ensures that all freed tuples are already in the FSM.
That has nothing to do with it, because the space isn't actually free
for re-use until vacuum deletes the tuple.
Hmm, but couldn't such an opportunistic approach be used for another leightweight VACUUM mode in such a
way, that VACUUM could look at a special "Hot Spot" queue, which represents potential candidates for
freeing? Let's call it a 2-phase VACUUM....this would avoid a long running VACUUM run on big tables,
e.g. when tuples gets updated (or deleted) frequently. Just an idea...
Bernd
Alvaro Herrera <alvherre@commandprompt.com> writes:
Tom Lane wrote:
That has nothing to do with it, because the space isn't actually free
for re-use until vacuum deletes the tuple.
I think the idea is a different "free space map" of sorts, whereby a
transaction that obsoletes a tuple puts its block number in that map. A
transaction that inserts a new tuple goes to the FSM. If nothing is
found, it then goes to the new map. A block returned from that map is
then scanned and any tuple that's no longer visible for anyone is
reused.
I thought we had sufficiently destroyed that "reuse a tuple" meme
yesterday. You can't do that: there are too many aspects of the system
design that are predicated on the assumption that dead tuples do not
come back to life. You have to do the full vacuuming bit (index entry
removal, super-exclusive page locking, etc) before you can remove a dead
tuple.
Essentially, we would be folding the "find
dead tuples and compress page" logic, which is currently in vacuum, back
to insert. IMHO this is unacceptable from a performance PoV.
That's the other problem: it's not apparent why pushing work from vacuum
back into foreground processing is a good idea. Especially not why
retail vacuuming of individual tuples will be better than wholesale.
regards, tom lane
On Thu, Mar 02, 2006 at 01:01:21AM -0500, Tom Lane wrote:
Essentially, we would be folding the "find
dead tuples and compress page" logic, which is currently in vacuum, back
to insert. IMHO this is unacceptable from a performance PoV.That's the other problem: it's not apparent why pushing work from vacuum
back into foreground processing is a good idea. Especially not why
retail vacuuming of individual tuples will be better than wholesale.
The problem is that even with vacuum_cost_delay, vacuum is still very
slow and problematic in situations such as a large tables in a heavy
transaction environment. Anything that could help reduce the need for
'traditional' vacuuming could well be a win.
Even so, I think the most productive path to pursue at this time is a
dead-space-map/known-clean-map. Either one is almost guaranteed to
provide benefits. Once we know what good they do we can move forward
from there with further improvements.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
I thought we had sufficiently destroyed that "reuse a tuple"
meme yesterday. You can't do that: there are too many
aspects of the system design that are predicated on the
assumption that dead tuples do not come back to life. You
have to do the full vacuuming bit (index entry removal,
super-exclusive page locking, etc) before you can remove a dead tuple.
One more idea I would like to throw in.
Ok, we cannot reuse a dead tuple. Maybe we can reuse the space of a dead
tuple by reducing the tuple to it's header info.
(If you still wanted to be able to locate index entries fast,
you would need to keep indexed columns, but I think we agreed that there
is
no real use)
I think that would be achievable at reasonable cost (since you can avoid
one page IO)
on the page of the currently active tuple (the first page that is
considered).
On this page:
if freespace available
--> use it
elsif freespace available after reducing all dead rows
--> use the freespace with a new slot
else ....
Of course this only works when we still have free slots,
but I think that might not really be an issue.
Andreas
Import Notes
Resolved by subject fallback
[sorry to everyone if that mail arrives multiple times, but i had
some odd problems with my mail gateway yesterday...]
On Wed, Mar 01, 2006 at 12:41:01PM -0500, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
How does an optimistic FSM entry avoid the need to run vacuum?
It ensures that all freed tuples are already in the FSM.
That has nothing to do with it, because the space isn't actually free
for re-use until vacuum deletes the tuple.
But couldn't such an opportunistic approach be used for
another lightweight VACUUM mode in such a way, that VACUUM could
look at a special "Hot Spot" queue, which represents potential
candidates for freeing? Let's call it a 2-phase VACUUM....this would
avoid a constant long running VACUUM run on big tables, e.g. when
tuples gets updated (or deleted) frequently. Just an idea...
Bernd
Ühel kenal päeval, N, 2006-03-02 kell 09:53, kirjutas Zeugswetter
Andreas DCP SD:
I thought we had sufficiently destroyed that "reuse a tuple"
meme yesterday. You can't do that: there are too many
aspects of the system design that are predicated on the
assumption that dead tuples do not come back to life. You
have to do the full vacuuming bit (index entry removal,
super-exclusive page locking, etc) before you can remove a dead tuple.One more idea I would like to throw in.
Ok, we cannot reuse a dead tuple. Maybe we can reuse the space of a dead
tuple by reducing the tuple to it's header info.
(If you still wanted to be able to locate index entries fast,
you would need to keep indexed columns, but I think we agreed that there
is
no real use)
I don't even think you need the header, just truncate the slot to be
0-size (the next pointer is the same as this one or make the pointer
point to unaligned byte or smth) and detect this condition when
accessing tuples. this would add on compare to all accesse to the tuple,
but I suspect that mostly it is a noop performance-wise as all data
needed is already available in level1 cache.
This would decouple declaring a tuple to be dead/reuse data space and
final cleanup/free index space.
--------------------
Hannu
Centuries ago, Nostradamus foresaw when tgl@sss.pgh.pa.us (Tom Lane) would write:
I thought we had sufficiently destroyed that "reuse a tuple" meme
yesterday. You can't do that: there are too many aspects of the system
design that are predicated on the assumption that dead tuples do not
come back to life.
This discussion needs to come up again in October when the zombie
movies come out :-).
That's the other problem: it's not apparent why pushing work from
vacuum back into foreground processing is a good idea. Especially
not why retail vacuuming of individual tuples will be better than
wholesale.
What is unclear to me in the discussion is whether or not this is
invalidating the item on the TODO list...
-------------------
Create a bitmap of pages that need vacuuming
Instead of sequentially scanning the entire table, have the background
writer or some other process record pages that have expired rows, then
VACUUM can look at just those pages rather than the entire table. In
the event of a system crash, the bitmap would probably be
invalidated. One complexity is that index entries still have to be
vacuumed, and doing this without an index scan (by using the heap
values to find the index entry) might be slow and unreliable,
especially for user-defined index functions.
-------------------
It strikes me as a non-starter to draw vacuum work directly into the
foreground; there is a *clear* loss in that the death of the tuple
can't actually take place at that point, due to MVCC and the fact that
it is likely that other transactions will be present, keeping the
tuple from being destroyed.
But it would *seem* attractive to do what is in the TODO, above.
Alas, the user defined index functions make cleanout of indexes much
more troublesome :-(. But what's in the TODO is still "wholesale,"
albeit involving more targetted selling than the usual Kirby VACUUM
:-).
--
select 'cbbrowne' || '@' || 'gmail.com';
http://linuxdatabases.info/info/rdbms.html
Rules of the Evil Overlord #140. "I will instruct my guards when
checking a cell that appears empty to look for the chamber pot. If the
chamber pot is still there, then the prisoner has escaped and they may
enter and search for clues. If the chamber pot is not there, then
either the prisoner is perched above the lintel waiting to strike them
with it or else he decided to take it as a souvenir (in which case he
is obviously deeply disturbed and poses no threat). Either way,
there's no point in entering." <http://www.eviloverlord.com/>
On Thu, Mar 02, 2006 at 08:33:46AM -0500, Christopher Browne wrote:
What is unclear to me in the discussion is whether or not this is
invalidating the item on the TODO list...-------------------
Create a bitmap of pages that need vacuuming
<snip>
I think this is doable, and not invalidated by anything said so far.
All this is changeing is whether to scan the whole table or just the
bits changed. Unfortunatly I don't think you can avoid scanning the
indexes :(.
Note, for this purpose you don't need to keep a bit per page. The
OS I/O system will load 64k+ (8+ pages) in one go so one bit per 8
pages would be sufficient.
The inverse is keep a list of pages where we know all tuples are
visible to everyone. I'm not sure if this can be done race condition
free. ISTM it would be possible to get the new Bitmap Index Scans to
avoid checking visiblity straight away but wait until it has been
AND/OR'd with other bitmaps and only at the end checking visibility.
But maybe that already happens...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Hannu Krosing <hannu@skype.net> writes:
Ühel kenal päeval, N, 2006-03-02 kell 09:53, kirjutas Zeugswetter
Andreas DCP SD:Ok, we cannot reuse a dead tuple. Maybe we can reuse the space of a dead
tuple by reducing the tuple to it's header info.
I don't even think you need the header, just truncate the slot to be
0-size
I think you must keep the header because the tuple might be part of an
update chain (cf vacuuming bugs we repaired just a few months ago).
t_ctid is potentially interesting data even in a certainly-dead tuple.
Andreas' idea is possibly doable but I am not sure that I see the point.
It does not reduce the need for vacuum nor the I/O load imposed by
vacuum. What it does do is bias the system in the direction of
allocating an unreasonably large number of tuple line pointers on a page
(ie, more than are useful when the page is fully packed with normal
tuples). Since we never reclaim such pointers, over time all the pages
in a table would tend to develop line-pointer-bloat. I don't know what
the net overhead would be, but it'd definitely impose some aggregate
inefficiency.
regards, tom lane