Multi-Versions and Vacuum

Started by Anthony Berglasover 23 years ago24 messagesgeneral
Jump to latest
#1Anthony Berglas
anthony.berglas@lucida.com

Thanks to all the people that responed. Summaries and notes:-

VACUUM

The consensus seems to be that old copies are not removed until a Vacuum is
performed. When a Vacuum is performed then only versions older than the
start of the oldest transaction are removed.

(Compare with Oracle -- a certain amount of space is allocated for multi
versions ("Rollback Segments"). Old versions are moved there, I think.
When the space is exhausted then then old versions are dropped
automatically. Occasionally this can cause transactions to be aborted
because the old versions they need are no longer available.)

What actually is the difference between Vaccum and Vacuum Full? I assume
that the former must release some disk space, otherwise it would be useless?

STORAGE

It appears that the versions are stored in the same blocks as the normal
data. Also, seems that the whole row is copied (correct?). This will
reduce the efficiency of the cache to some extent by having blocks
containing non-pertinent data.

LOCKING

The docs mentioned phantoms in the Serializable section, implying that they
were not trapped by Read Committed. But they are, good.

My note about Oracle is that in its Read Committed mode transactions are
safer without being Serializable. It is half way between what Postgres call
Read Committed and Serializable. And if I think that in that regard Oracle
got it right. Safer transactions without Serializable rollback problems
with no real downside. Particularly for reports.

USING THE LOGS FOR MULTI VERSIONS

If anyone know why this is a bad idea I'd be interested. Seems to be a win
win to me. No need to vacuum, no additional overhead during update (you
need the logs anyway), fast queries, less disk space. (This is not what
Oracle does.)

TIME TRAVEL

Sounds fun. But hard to get right.

EXISTING DOCS

When this thread settles I'll add a note to the interactive docs.

(Thanks Neil for taking the trouble to provide doc fragments. But if you
read them again you will note that they are not at all clear on the actual
question that I asked. Which is why I asked them.)

#2Neil Conway
neilc@samurai.com
In reply to: Anthony Berglas (#1)
Re: Multi-Versions and Vacuum

On Thu, Jul 18, 2002 at 09:23:48PM -0700, Anthony Berglas wrote:

(Compare with Oracle -- a certain amount of space is allocated for multi
versions ("Rollback Segments"). Old versions are moved there, I think.
When the space is exhausted then then old versions are dropped
automatically. Occasionally this can cause transactions to be aborted
because the old versions they need are no longer available.)

IMHO, that is much worse than the scheme we have now -- particularly
since some kind of auto-vacuuming system is on the horizon.

What actually is the difference between Vaccum and Vacuum Full? I assume
that the former must release some disk space, otherwise it would be useless?

http://developer.postgresql.org/docs/postgres/sql-vacuum.html , in
particular:

VACUUM FULL does more extensive processing, including moving of tuples
across blocks to try to compact the table to the minimum number of disk
blocks. This form is much slower and requires an exclusive lock on each
table while it is being processed.

There is more information on VACUUM FULL here:
http://developer.postgresql.org/docs/postgres/routine-vacuuming.html

EXISTING DOCS

When this thread settles I'll add a note to the interactive docs.

Might be a better idea to send a patch for the SGML to pgsql-patches. Or
if you'd rather not do that, just send any additions you'd like to me
and I'll make the necessary SGML changes and send that to -patches.

(Thanks Neil for taking the trouble to provide doc fragments. But if you
read them again you will note that they are not at all clear on the actual
question that I asked. Which is why I asked them.)

On the contrary, 2 of your 3 questions were clearly answered in the
docs, as is the VACUUM FULL question you just asked. If you think the
docs are unclear or incomplete, can you be more specific?

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

#3Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Anthony Berglas (#1)
Re: Multi-Versions and Vacuum

Anthony Berglas wrote:

Thanks to all the people that responed. Summaries and notes:-

LOCKING

The docs mentioned phantoms in the Serializable section, implying
that they were not trapped by Read Committed. But they are, good.

My note about Oracle is that in its Read Committed mode transactions
are safer without being Serializable.

What's safer ?

regards,
Hiroshi Inoue
http://w2422.nsk.ne.jp/~inoue/

#4Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#3)
Re: Multi-Versions and Vacuum

grant wrote:

If Oracle uses multiple version concurrency, then why does it appear to do
row locking (I use Oracle at work, but I do not admin it)

The only downside I have seen to MVCS is the extra disk space, but with
the performance improvements in a highly concurrent use environment, it is
well worth it.

Q: Does VACUUM FULL remove the rows that are too many for the FSM? Or,
are they just stuck? I know that regular VACUUM will not allow reuse of
overflowed rows. Also, how does one calculate the propper size for the
FSM?

VACUUM full removes all unused rows from disk _including_ ones that
aren't in the free space map.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5grant
grant@amadensor.com
In reply to: Anthony Berglas (#1)
Re: Multi-Versions and Vacuum

If Oracle uses multiple version concurrency, then why does it appear to do
row locking (I use Oracle at work, but I do not admin it)

The only downside I have seen to MVCS is the extra disk space, but with
the performance improvements in a highly concurrent use environment, it is
well worth it.

Q: Does VACUUM FULL remove the rows that are too many for the FSM? Or,
are they just stuck? I know that regular VACUUM will not allow reuse of
overflowed rows. Also, how does one calculate the propper size for the
FSM?

______________________________________________________________________________

Your mouse has moved.
You must restart Windows for your changes to take effect.

#!/usr/bin/perl
print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: Multi-Versions and Vacuum

Bruce Momjian <pgman@candle.pha.pa.us> writes:

grant wrote:

Q: Does VACUUM FULL remove the rows that are too many for the FSM? Or,
are they just stuck? I know that regular VACUUM will not allow reuse of
overflowed rows.

You seem to know things that aren't true. What is an "overflowed row"
anyway?

VACUUM full removes all unused rows from disk _including_ ones that
aren't in the free space map.

The FSM isn't interested in individual rows in the first place; it just
records the amount of free space per *page*.

After VACUUM does its thing (and yes, it removes everything it legally
can) it reports the amount of free space it found on each page to the
FSM, and the FSM remembers as much of that info as it has room for ---
preferring pages with more free space, if it has to be selective.
Subsequently, when any backend needs to add a tuple to a relation, it
first looks in the FSM to see if there is an extant page with enough
room. If there's no suitable entry in the FSM, then try the last extant
page; if no luck there, add a new page.

regards, tom lane

#7Andrew Sullivan
andrew@libertyrms.info
In reply to: Tom Lane (#6)
Re: Multi-Versions and Vacuum

On Fri, Jul 19, 2002 at 12:35:23PM -0400, Tom Lane wrote:

After VACUUM does its thing (and yes, it removes everything it legally
can) it reports the amount of free space it found on each page to the
FSM, and the FSM remembers as much of that info as it has room for ---
preferring pages with more free space, if it has to be selective.

I think this is where the confusion lies. The ever-growing tables
that people are experiencing is somehow related to the new
non-blocking vacuum and the value of the free space map settings.
Previous posts on the matter have left some (well, me, at least) with
the idea that the non-blocking vacuum doesn't get everything, just in
case the free space map values are too small.

But if I understand now what you're saying, it's this: the vacuum
_does_ get everything, but if the free space map settings are too
small, the free space map doesn't know about that free space;
therefore, the space never gets re-used for future data, because as
far as the postmaster is concerened, the space isn't available. So
the tables keep growing because they have to append their data at the
end of the file (because the free space map hasn't reported any free
space). Run vacuum a few times like this, and there is a substantial
amount of unused space in the tables, where the free space map
doesn't have an entry for it. Is that right?

If so, it should be possible to calculate what one needs for the
max_fsm_pages setting by calculating how much data is replaced or
deleted in any given period, vacuum-to-vacuum, I guess. What one
would need to do is calculate how much space a row is taking up on
disk, and then set the max_fsm_pages to comprise (almost?) enough
space to keep a map big enough to include all the rows likely to be
recovered by vacuum. Or am I completely mad? (I know, I know, "Yes"
is a possible answer there.)

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Sullivan (#7)
Free space mapping (was Re: Multi-Versions and Vacuum)

Andrew Sullivan <andrew@libertyrms.info> writes:

But if I understand now what you're saying, it's this: the vacuum
_does_ get everything, but if the free space map settings are too
small, the free space map doesn't know about that free space;
therefore, the space never gets re-used for future data, because as
far as the postmaster is concerened, the space isn't available. So
the tables keep growing because they have to append their data at the
end of the file (because the free space map hasn't reported any free
space). Run vacuum a few times like this, and there is a substantial
amount of unused space in the tables, where the free space map
doesn't have an entry for it. Is that right?

Well, the design theory was that once you free enough space on a
particular page, it *will* get into the FSM and then will be filled up.
The fact that small amounts of free space don't get re-used quickly
doesn't bother me a whole lot; the table size should still stabilize
at some proportion of free to used space that depends on your vacuum
frequency.

However, if the FSM is vastly smaller than the number of
actively-changing pages in your database, then that argument breaks down
--- in particular, if the FSM can't even keep track of all the
completely-empty pages in your database then you are going to suffer
progressive leakage.  I'm not sure where the problem gets bad enough to
worry about, and have not had time to try to do a probabilistic
analysis.

If we could determine where trouble sets in, perhaps we could detect the
situation and print warnings during VACUUM? Another possibility is to
auto-size FSM during postmaster startup, perhaps as a fraction of the
total DB size measured by "du". (But that probably doesn't work if
you've been playing symlink games.) In any case we need more knowledge
about appropriate FSM sizes than we have at the moment.

The whole FSM concept is new in 7.2 and I'm sure it still needs
refinement. Ideas welcome.

regards, tom lane

#9Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#8)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

grant wrote:

So, then, even if there is an overflow of pages with free space, stepping
up the vacuum schedule may help this eventually catch up.

Depends.

If you only UPDATE that table, there will never be less freespace again,
unless you VACUUM FULL. In average every UPDATE frees as much space as
it reuses. If your data grows, via INSERTS, then yes, it should catch up
someday if you VACUUM often enough.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#10Andrew Sullivan
andrew@libertyrms.info
In reply to: Tom Lane (#8)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

On Fri, Jul 19, 2002 at 02:11:09PM -0400, Tom Lane wrote:

However, if the FSM is vastly smaller than the number of
actively-changing pages in your database, then that argument breaks down
--- in particular, if the FSM can't even keep track of all the
completely-empty pages in your database then you are going to suffer
progressive leakage. 

That's consistent with the reports people have made, because it was
always in the case of databases with extremely high turnover that it
turned up. So at the very least, people who anticipate high turnover
on their systems should set the free space map higher (and maybe
vacuum more often -- AFAIK, though, no-one has yet documented the
performance penalty of non-blocking vacuum. I guess it should be no
worse than any other client, but I'm not sure, and haven't tested
yet).

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#11grant
grant@amadensor.com
In reply to: Tom Lane (#8)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

So, then, even if there is an overflow of pages with free space, stepping
up the vacuum schedule may help this eventually catch up.

As for an automated way to set this up, if the FSM size cannot be adjusted
without a restart of the postmaster, then on startup, you could take a
number of pages currently with free space or a percentage of the number of
pages, which ever is greater. The percentage could have a default, and
also be configurable.

#12Bruce Momjian
bruce@momjian.us
In reply to: Andrew Sullivan (#7)
Re: Multi-Versions and Vacuum

Added to TODO:

* Allow free space map to be auto-sized or warn when it is too small

---------------------------------------------------------------------------

Andrew Sullivan wrote:

On Fri, Jul 19, 2002 at 12:35:23PM -0400, Tom Lane wrote:

After VACUUM does its thing (and yes, it removes everything it legally
can) it reports the amount of free space it found on each page to the
FSM, and the FSM remembers as much of that info as it has room for ---
preferring pages with more free space, if it has to be selective.

I think this is where the confusion lies. The ever-growing tables
that people are experiencing is somehow related to the new
non-blocking vacuum and the value of the free space map settings.
Previous posts on the matter have left some (well, me, at least) with
the idea that the non-blocking vacuum doesn't get everything, just in
case the free space map values are too small.

But if I understand now what you're saying, it's this: the vacuum
_does_ get everything, but if the free space map settings are too
small, the free space map doesn't know about that free space;
therefore, the space never gets re-used for future data, because as
far as the postmaster is concerened, the space isn't available. So
the tables keep growing because they have to append their data at the
end of the file (because the free space map hasn't reported any free
space). Run vacuum a few times like this, and there is a substantial
amount of unused space in the tables, where the free space map
doesn't have an entry for it. Is that right?

If so, it should be possible to calculate what one needs for the
max_fsm_pages setting by calculating how much data is replaced or
deleted in any given period, vacuum-to-vacuum, I guess. What one
would need to do is calculate how much space a row is taking up on
disk, and then set the max_fsm_pages to comprise (almost?) enough
space to keep a map big enough to include all the rows likely to be
recovered by vacuum. Or am I completely mad? (I know, I know, "Yes"
is a possible answer there.)

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
+1 416 646 3304 x110

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Tom Lane wrote:

Andrew Sullivan <andrew@libertyrms.info> writes:

But if I understand now what you're saying, it's this: the vacuum
_does_ get everything, but if the free space map settings are too
small, the free space map doesn't know about that free space;
therefore, the space never gets re-used for future data, because as
far as the postmaster is concerened, the space isn't available. So
the tables keep growing because they have to append their data at the
end of the file (because the free space map hasn't reported any free
space). Run vacuum a few times like this, and there is a substantial
amount of unused space in the tables, where the free space map
doesn't have an entry for it. Is that right?

Well, the design theory was that once you free enough space on a
particular page, it *will* get into the FSM and then will be filled up.
The fact that small amounts of free space don't get re-used quickly
doesn't bother me a whole lot; the table size should still stabilize
at some proportion of free to used space that depends on your vacuum
frequency.

However, if the FSM is vastly smaller than the number of
actively-changing pages in your database, then that argument breaks down
--- in particular, if the FSM can't even keep track of all the
completely-empty pages in your database then you are going to suffer
progressive leakage.  I'm not sure where the problem gets bad enough to
worry about, and have not had time to try to do a probabilistic
analysis.

If we could determine where trouble sets in, perhaps we could detect the
situation and print warnings during VACUUM? Another possibility is to
auto-size FSM during postmaster startup, perhaps as a fraction of the
total DB size measured by "du". (But that probably doesn't work if
you've been playing symlink games.) In any case we need more knowledge
about appropriate FSM sizes than we have at the moment.

The whole FSM concept is new in 7.2 and I'm sure it still needs
refinement. Ideas welcome.

Not sure if I have said this before, but I think the auto-sizing needs
to get some feedback from when empty pages overflow the FSM _and_ when
later there is demand for free pages and none exist in the FSM.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Bruce Momjian (#13)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

(mass snippage)

The ever-growing tables
that people are experiencing is somehow related to the new
non-blocking vacuum and the value of the free space map settings...
But if I understand now what you're saying, it's this: the vacuum
_does_ get everything....

There is also a complicating effect when rows have toasted storage, which can prevent the space being freed (I think),
see hackers thread entitled something like "unbounded database growth".

I have set myself the task of examining this - but wish I had picked an easier 1st project :-(
- I am having to learn about vacuum, toasting ,freespace, and generally get to grips with the Pg code as well - just as well I have some free time at the moment...:-)

best wishes

Mark

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#14)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Mark Kirkwood <markir@slingshot.co.nz> writes:

There is also a complicating effect when rows have toasted storage, which can prevent the space being freed (I think),
see hackers thread entitled something like "unbounded database growth".

I have set myself the task of examining this -

It doesn't need a lot of examination in my mind: the cause is surely
growth of the index on the toast table. Since that index's first column
is the OID assigned to the toast item, the range of indexed values tends
to shift over time, causing the leftmost parts of the index btree to
become sparsely populated and eventually empty. Since we don't
currently have code to collapse out empty pages in a btree, the index
grows during continued update activity, even if the total amount of data
you're storing isn't growing. This is quite unrelated to the free space
map --- indexes don't use the FSM.

REINDEXing the toast index is the only available workaround at the
moment.

Eventually we need code to collapse out free space in indexes.

regards, tom lane

#16Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#14)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Tom Lane wrote:

It doesn't need a lot of examination in my mind: the cause is surely
growth of the index on the toast table. Since that index's first column
is the OID assigned to the toast item, the range of indexed values tends
to shift over time, causing the leftmost parts of the index btree to
become sparsely populated and eventually empty. Since we don't
currently have code to collapse out empty pages in a btree, the index
grows during continued update activity, even if the total amount of data
you're storing isn't growing. This is quite unrelated to the free space
map --- indexes don't use the FSM.

hmmm, good point - I thought the toast table was growing too, but I will
check this.

regards

Mark

#17Alvaro Herrera
alvherre@atentus.com
In reply to: Tom Lane (#15)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

En Tue, 27 Aug 2002 18:45:05 -0400
Tom Lane <tgl@sss.pgh.pa.us> escribi�:

Eventually we need code to collapse out free space in indexes.

Maybe when some lower limit is reached in two consecutive pages they
should be merged?

Maybe this can be one of my projects when I start really working on
Postgres. I have yet to pick the items that I'll do. I don't know how
difficult it is though.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con
ciervos incendi�nse", y el computador dir� "especifique el tipo de ciervo"
(Jason Alexander)

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#17)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Alvaro Herrera <alvherre@atentus.com> writes:

Tom Lane <tgl@sss.pgh.pa.us> escribi�:

Eventually we need code to collapse out free space in indexes.

Maybe when some lower limit is reached in two consecutive pages they
should be merged?

Yup. The tricky part is to do this without causing problems for
concurrent readers of the index.

The Lehman-Yao paper that is the basis of our existing code explains how
readers can operate concurrently with splits of index pages. It doesn't
address doing page merges concurrently with reader scans. There are
newer papers in the literature that explain how to do that ... but no
one's gotten round to trying to implement it for Postgres.

regards, tom lane

#19Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#14)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Tom Lane wrote:

It doesn't need a lot of examination in my mind: the cause is surely
growth of the index on the toast table.

You are indeed correct !

A quick check of my test case shows that the toast table growth tails
off ( provided max_fsm_pages is suitably set), and the unbounded growth
comes entirely from the toast index.

Applying REINDEX brings the space usage down to a level that compares
with a non-toasted example.

With respect to the REINDEX workaround - having to re-start your server
single process for it is a bit fatal if you are 24/7 shop - I think
altering those high hit tables to have attributes detoasted might be
better (row length permitting).

Anyway that has *toasted* the need for my currrent little investigation
(quite enjoyed my romp through the code, so I am not too worried). I
still have some free time (I am off work after knee operation) so I'm
now looking for another project....

regards

Mark

#20Martijn van Oosterhout
kleptog@svana.org
In reply to: Mark Kirkwood (#19)
Re: Free space mapping (was Re: Multi-Versions and Vacuum)

Eh? Stopping server to do a REINDEX? I use REINDEX all the time and you sure
as hell don't need to stop the server. Unless it's for a system table that
is.

(BTW, 7.0 certainly seems to have had some dodgy index code. 7.2 seems much
better. Fortunatly it only seems to hit multicolumn indexes.)

On Thu, Aug 29, 2002 at 09:26:56PM +1200, Mark Kirkwood wrote:

Tom Lane wrote:

It doesn't need a lot of examination in my mind: the cause is surely
growth of the index on the toast table.

You are indeed correct !

A quick check of my test case shows that the toast table growth tails
off ( provided max_fsm_pages is suitably set), and the unbounded growth
comes entirely from the toast index.

Applying REINDEX brings the space usage down to a level that compares
with a non-toasted example.

With respect to the REINDEX workaround - having to re-start your server
single process for it is a bit fatal if you are 24/7 shop - I think
altering those high hit tables to have attributes detoasted might be
better (row length permitting).

Anyway that has *toasted* the need for my currrent little investigation
(quite enjoyed my romp through the code, so I am not too worried). I
still have some free time (I am off work after knee operation) so I'm
now looking for another project....

regards

Mark

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#21Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#14)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#19)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
#24Barry Lind
barry@xythos.com
In reply to: Mark Kirkwood (#14)