FSM search modes

Started by Simon Riggsover 16 years ago35 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Just been looking again at the way FSM works. In fsm_search_avail() we
essentially have just a single way for working out how to search the
tree.

Seems like it would be good to abstract this so that we can implement a
number of FSM search strategies

* (current) randomize - page selection encourages different backends to
access different blocks, thus reducing block contention

* cluster - page selection made based around selecting block with
freespace nearest current block and we prefer keep-in-sequence to
avoid-contention

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

These are not all mutually exclusive, suggested combinations would be

randomize, randomize | cluster, randomize | compact

So we don't give up the load spreading behaviour, we just apply the
logic at lower levels of the tree only.

VACUUM could set the FSM into FSMstrategy = compact when it notices that
most of the free blocks are at lower end of table. Or explicitly set
during VF replacement utility.

FSMstrategy = cluster would be the default if clustering is enabled on a
table.

FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...)

--
Simon Riggs www.2ndQuadrant.com

#2ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Simon Riggs (#1)
Re: FSM search modes

Simon Riggs <simon@2ndQuadrant.com> wrote:

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

We cannot shrink the table if one tuple remains at the end of table
and the tuple is always HOT-updated, because we put a new tuple into
the same page where the old tuple is placed if possible.

In addition to your intelligent FSM search modes,
do we need another algorithm to make the compaction to work better?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: ITAGAKI Takahiro (#2)
Re: FSM search modes

On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

We cannot shrink the table if one tuple remains at the end of table
and the tuple is always HOT-updated, because we put a new tuple into
the same page where the old tuple is placed if possible.

In addition to your intelligent FSM search modes,
do we need another algorithm to make the compaction to work better?

Perhaps we can have an additional piece of information about a table.
Something like target_size, so that normal updaters that attempt HOT
updates on blocks greater than target_size would know to avoid that
block and to seek a new location for the row lower in the table. Perhaps
such information could be reset and then sent via invalidation
mechanisms.

I'm thinking along the lines of a "fire alarm". An occasional mechanism
by which we can inform users of the need to evacuate certain blocks.

--
Simon Riggs www.2ndQuadrant.com

#4Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Simon Riggs (#3)
Re: FSM search modes

On Sep 18, 2009, at 1:09 AM, Simon Riggs wrote:

On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

We cannot shrink the table if one tuple remains at the end of table
and the tuple is always HOT-updated, because we put a new tuple into
the same page where the old tuple is placed if possible.

In addition to your intelligent FSM search modes,
do we need another algorithm to make the compaction to work better?

Perhaps we can have an additional piece of information about a table.
Something like target_size, so that normal updaters that attempt HOT
updates on blocks greater than target_size would know to avoid that
block and to seek a new location for the row lower in the table.
Perhaps
such information could be reset and then sent via invalidation
mechanisms.

I'm thinking along the lines of a "fire alarm". An occasional
mechanism
by which we can inform users of the need to evacuate certain blocks.

It might be better to not beat around the bush and provide a vacuum
mode that explicitly tries to free the last X percent of the table.
That's especially handy for a very large table, because you probably
don't want to be forced into scanning the whole thing in vacuum just
to free some space at the end. This mode could also be more
aggressive about trying to acquire the lock that's needed to trim the
file on disk.

That said, *any* step that improves dealing with table bloat is
extremely welcome, as right now you're basically stuck rebuilding the
table.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

#5Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#1)
Re: FSM search modes

On Thu, 2009-09-17 at 16:26 +0100, Simon Riggs wrote:

Just been looking again at the way FSM works. In fsm_search_avail() we
essentially have just a single way for working out how to search the
tree.

Seems like it would be good to abstract this so that we can implement a
number of FSM search strategies

* (current) randomize - page selection encourages different backends to
access different blocks, thus reducing block contention

* cluster - page selection made based around selecting block with
freespace nearest current block and we prefer keep-in-sequence to
avoid-contention

Is the information about "current block" available to FSM search API, or
do we need to change the API for that ?

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

These are not all mutually exclusive, suggested combinations would be

randomize, randomize | cluster, randomize | compact

So we don't give up the load spreading behaviour, we just apply the
logic at lower levels of the tree only.

VACUUM could set the FSM into FSMstrategy = compact when it notices that
most of the free blocks are at lower end of table. Or explicitly set
during VF replacement utility.

FSMstrategy = cluster would be the default if clustering is enabled on a
table.

FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...)

--
Simon Riggs www.2ndQuadrant.com

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#5)
Re: FSM search modes

On Fri, 2009-09-18 at 15:43 +0300, Hannu Krosing wrote:

* cluster - page selection made based around selecting block with
freespace nearest current block and we prefer keep-in-sequence to
avoid-contention

Is the information about "current block" available to FSM search API, or
do we need to change the API for that ?

Need to change the API, but its a small code impact.

--
Simon Riggs www.2ndQuadrant.com

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jim Nasby (#4)
Re: FSM search modes

decibel <decibel@decibel.org> wrote:

*any* step that improves dealing with table bloat is extremely
welcome, as right now you're basically stuck rebuilding the table.

+1

Although, possibly more irritating than actually rebuilding it is
evaluating borderline bloat situations to determine if they will "work
themselves out" over time or whether you need to bite the bullet to do
aggressive maintenance. Having some way for routine vacuums (or any
other routine process, currently available or that could be scheduled)
to "nibble away" at moderate bloat without significant performance or
usability impact would make life easier for at least *some* DBAs.

-Kevin

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Kevin Grittner (#7)
Re: FSM search modes

On Sep 30, 2009, at 5:13 PM, Kevin Grittner wrote:

decibel <decibel@decibel.org> wrote:

*any* step that improves dealing with table bloat is extremely
welcome, as right now you're basically stuck rebuilding the table.

+1

Although, possibly more irritating than actually rebuilding it is
evaluating borderline bloat situations to determine if they will "work
themselves out" over time or whether you need to bite the bullet to do
aggressive maintenance. Having some way for routine vacuums (or any
other routine process, currently available or that could be scheduled)
to "nibble away" at moderate bloat without significant performance or
usability impact would make life easier for at least *some* DBAs.

Kevin, do you have tools that allow you to clear out the end of a
table? That part is at least mostly possible from userland (get list
of ctids from end of table, update those records to move them, rinse,
repeat) but even if you do all that there's no guarantee that a
vacuum will get the exclusive lock it needs to truncate the table.

So while something that makes it easier to clean out the end of a
table would be good, I think the critical need is a way to make
vacuum more aggressive about obtaining the exclusive lock.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#8)
Re: FSM search modes

decibel wrote:

So while something that makes it easier to clean out the end of a
table would be good, I think the critical need is a way to make
vacuum more aggressive about obtaining the exclusive lock.

I wonder if we should have a different mode of operation that only
attempted the truncate (say VACUUM TRUNCATE), optionally being
non-conditional about obtaining the required lock. That said, I wonder
even more whether any such hacks are still needed after the visilibity
map that changed the landscape for vacuum so dramatically.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: FSM search modes

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we should have a different mode of operation that only
attempted the truncate (say VACUUM TRUNCATE), optionally being
non-conditional about obtaining the required lock. That said, I wonder
even more whether any such hacks are still needed after the visilibity
map that changed the landscape for vacuum so dramatically.

Yeah. The one thing in this thread that seemed like a good idea to me
was to bias the FSM code a little bit towards returning space near the
start of the relation, rather than its current behavior of treating all
the free space equally. The current arrangement pretty much guarantees
that you'll never recover from a bloating episode without taking special
manual action. (This is not new to 8.4, the previous FSM code behaved
the same.)

The analogy in the back of my mind is the btree code that decides
whether to split the current page or move to the next page when it has a
full page and a new key that could go to either page. We found out that
randomizing that choice made a *huge* improvement in the average
behavior, even with the probabilities set up as 99-to-1. I'm thinking
that just a small chance of resetting the search to the start of the
relation might do the trick for FSM.

regards, tom lane

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#10)
Re: FSM search modes

On Thu, 2009-10-01 at 12:05 -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we should have a different mode of operation that only
attempted the truncate (say VACUUM TRUNCATE), optionally being
non-conditional about obtaining the required lock. That said, I wonder
even more whether any such hacks are still needed after the visilibity
map that changed the landscape for vacuum so dramatically.

Yeah. The one thing in this thread that seemed like a good idea to me
was to bias the FSM code a little bit towards returning space near the
start of the relation, rather than its current behavior of treating all
the free space equally. The current arrangement pretty much guarantees
that you'll never recover from a bloating episode without taking special
manual action. (This is not new to 8.4, the previous FSM code behaved
the same.)

The analogy in the back of my mind is the btree code that decides
whether to split the current page or move to the next page when it has a
full page and a new key that could go to either page. We found out that
randomizing that choice made a *huge* improvement in the average
behavior, even with the probabilities set up as 99-to-1. I'm thinking
that just a small chance of resetting the search to the start of the
relation might do the trick for FSM.

No real need to be random is there? In the bloated space scenario,
VACUUM will be triggered but will be unable to remove the empty blocks.
So in that case VACUUM can hint the FSM to perform "start from beginning
of relation" behaviour. When a searcher does that and can't usefully
find space quickly, then we can reset the hint back to normal. So it's
an automated mode change in both directions.

I think we can use the N-to-1 idea to define what we mean by "quickly",
but that should be in proportion to exactly how many free blocks there
were in table, not a fixed N. It would be a shame to make this work
poorly for very large tables.

It would be more useful to think of this as "look for huge chunks of
space and fill them" rather than "start at beginning", since space won't
always be right at start.

--
Simon Riggs www.2ndQuadrant.com

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#11)
Re: FSM search modes

Simon Riggs <simon@2ndQuadrant.com> writes:

No real need to be random is there? In the bloated space scenario,
VACUUM will be triggered but will be unable to remove the empty blocks.
So in that case VACUUM can hint the FSM to perform "start from beginning
of relation" behaviour.

No, that's an entirely unrelated issue. My point is that there won't
*be* any empty end blocks unless we discourage FSM from handing out
those pages as insertion targets.

regards, tom lane

#13Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#11)
Re: FSM search modes

Simon Riggs <simon@2ndQuadrant.com> wrote:

It would be more useful to think of this as "look for huge chunks of
space and fill them" rather than "start at beginning", since space
won't always be right at start.

Either I misunderstand you or I disagree. If there's a huge chunk of
space near the end, and many many smaller spaces spread throughout,
what I'd like is for rows to be placed in those small ones. This
would minimize the number of pages to read for queries, and would
present some possibility that the rows past the huge chunk might
eventually be deleted or non-HOT updated, allowing the bloat to
eventually be cleaned up with minimal pain.

-Kevin

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#13)
Re: FSM search modes

On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote:

Either I misunderstand you or I disagree.

That does seem to be a common stance, though I will read on. :-)

--
Simon Riggs www.2ndQuadrant.com

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#13)
Re: FSM search modes

On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote:

If there's a huge chunk of
space near the end, and many many smaller spaces spread throughout,
what I'd like is for rows to be placed in those small ones. This
would minimize the number of pages to read for queries, and would
present some possibility that the rows past the huge chunk might
eventually be deleted or non-HOT updated, allowing the bloat to
eventually be cleaned up with minimal pain.

Yes, as Tom points out, this must be done with bias away from the very
end of the table.

I meant that we should start from the beginning of large spaces and that
we shouldn't assume that all space worth filling is at start of
relation.

--
Simon Riggs www.2ndQuadrant.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#15)
Re: FSM search modes

Simon Riggs <simon@2ndQuadrant.com> writes:

Yes, as Tom points out, this must be done with bias away from the very
end of the table.

I meant that we should start from the beginning of large spaces and that
we shouldn't assume that all space worth filling is at start of
relation.

Right. One of the goals that FSM is trying to meet is ensuring that
different backends aren't trying to insert into the same page
concurrently, so as to reduce page-level contention. So for example
it'd be a bad idea to adopt the really-dumb strategy of searching from
the start of the table for each request --- it'd funnel all the backends
to the same target page. What we want is a behavior that is randomized
but tends to prefer pages towards the start rather than hitting all free
pages equally. The btree precedent makes me suspect that quite a weak
preference would be sufficient. So for example we might try resetting
the search to the start of the relation with probability 0.01.

regards, tom lane

#17Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#16)
Re: FSM search modes

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Yes, as Tom points out, this must be done with bias away from the
very end of the table.

I meant that we should start from the beginning of large spaces and
that we shouldn't assume that all space worth filling is at start
of relation.

OK, so I did misunderstand you; we agree after all. :-)

So for example we might try resetting the search to the start of the
relation with probability 0.01.

If I understand the heuristic you propose, and my math skill haven't
eroded too badly from lack of use, every 229 spots considered would
cause a 90% chance of reset. That means that the odds of getting past
50,000 spots (the number of pages with available free space at which I
generally start to get worried) without resetting is about 1 in 10^218
-- which is a risk I'm willing to accept. ;-)

-Kevin

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#17)
Re: FSM search modes

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

So for example we might try resetting the search to the start of the
relation with probability 0.01.

If I understand the heuristic you propose, and my math skill haven't
eroded too badly from lack of use, every 229 spots considered would
cause a 90% chance of reset.

Sorry, I wasn't clear. What I was thinking of was that we'd consider
resetting the search position once, upon entry to fsm_search, and then
search normally thereafter. Some experimentation would be needed to
choose the right probability of course. A number like 0.01 might seem
too small to affect the behavior at all, but that's what we thought
about the btree case too. A very light thumb upon the scales may be
sufficient.

regards, tom lane

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#17)
Re: FSM search modes

Kevin Grittner wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Yes, as Tom points out, this must be done with bias away from the
very end of the table.
I meant that we should start from the beginning of large spaces and
that we shouldn't assume that all space worth filling is at start
of relation.

OK, so I did misunderstand you; we agree after all. :-)

So for example we might try resetting the search to the start of the
relation with probability 0.01.

If I understand the heuristic you propose, and my math skill haven't
eroded too badly from lack of use, every 229 spots considered would
cause a 90% chance of reset. That means that the odds of getting past
50,000 spots (the number of pages with available free space at which I
generally start to get worried) without resetting is about 1 in 10^218
-- which is a risk I'm willing to accept. ;-)

The FSM currently uses a clock sweep kind of algorithm to hand out
pages, and the clock hand is reset to 0 at every vacuum. The purpose of
resetting the clock hand is precisely to bias towards lower-numbered
pages, to allow truncation later on.

That's a bit of an over-simplification, though, there's actually a
separate "clock hand" on each FSM page (next_slot pointer).

We probably could do with more bias. For example, we still prefer pages
close to the page we last inserted to, by searching for free space in
the same FSM page first, before starting the search from the top of the
tree. And of course, each backend tries first to insert to the last page
it inserted to before even consulting the FSM. A mode to disable those
behaviors might indeed be useful, along with the random reset.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#19)
Re: FSM search modes

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Kevin Grittner wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

So for example we might try resetting the search to the start of

the

relation with probability 0.01.

If I understand the heuristic you propose, and my math skill
haven't eroded too badly from lack of use, every 229 spots
considered would cause a 90% chance of reset. That means that the
odds of getting past 50,000 spots (the number of pages with
available free space at which I generally start to get worried)
without resetting is about 1 in 10^218 -- which is a risk I'm
willing to accept. ;-)

The FSM currently uses a clock sweep kind of algorithm to hand out
pages, and the clock hand is reset to 0 at every vacuum. The purpose
of resetting the clock hand is precisely to bias towards
lower-numbered pages, to allow truncation later on.

That's a bit of an over-simplification, though, there's actually a
separate "clock hand" on each FSM page (next_slot pointer).

We probably could do with more bias. For example, we still prefer
pages close to the page we last inserted to, by searching for free
space in the same FSM page first, before starting the search from
the top of the tree. And of course, each backend tries first to
insert to the last page it inserted to before even consulting the
FSM. A mode to disable those behaviors might indeed be useful, along
with the random reset.

That flushes out the descriptions given by Tom, but doesn't really
make me think I misunderstood the heuristic or miscalculated the odds
of getting far from the start with a 1% probability of a reset on each
advance of the sweep. (Of course, it's possible I'm being dense and
taking statements to mean what I expect, despite evidence to the
contrary.)

The way I figure it, if there is a 0.01 chance to reset the sweep,
then there's a 0.99 percent chance to continue the sweep from the last
position. 0.99^229 is about 0.1, which means there is a 10% chance
not to have reset after that many attempts to advance. Every 229
attempts after that multiplies by 0.1, too. So, after 229*6 attempts
(for example) the odds of not having reset are about one in a million.

Am I saying something stupid here? (I used to be good at these sorts
of calculations....)

-Kevin

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#19)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#20)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#22)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#27Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#22)
#28Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#24)
#29Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#23)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#30)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#24)
#33Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#31)
#34Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#26)
#35Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#29)