Dynamic Partitioning using Segment Visibility Maps

simon@2ndQuadrant.com

over 18 years ago

In reply to: Sam Mason (#2)

Re: Dynamic Partitioning using Segment Visibility Maps

On Thu, 2008-01-03 at 00:41 +0000, Sam Mason wrote:

On Wed, Jan 02, 2008 at 05:56:14PM +0000, Simon Riggs wrote:

Like it?

Sounds good. I've only given it a quick scan though. Would read-only
segments retain the same disk-level format as is currently?

Yes, no changes at all to the table. So your app would just work,
without any DDL changes. Existing partitioning apps would not change.

It seems
possible to remove the MVCC fields and hence get more tuples per page---
whether this would actually be a net performance gain/loss seems like
a difficult question question to answer, it would definitly be a
complexity increase though.

I've been looking at general compression at table and segment level, but
thats further down the track. Removing the MVCC fields is too much work,
I think.

Reading this reminds me of the design of the store for a persistent
operating system called EROS. It has a very good paper[1] describing
the design (implementation and careful benchmarking thereof) that I
think could be a useful read.

Thanks, will do.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

gokul007@gmail.com

over 18 years ago

In reply to: Simon Riggs (#3)

Re: Dynamic Partitioning using Segment Visibility Maps

Hi Simon,
Looks like a novel idea. I just want to confirm my
understanding of the proposal.
a) This proposal would work for the kind of table organizations which are
currently partitioned and maintained based on some kind of timestamp.
Consider one of the use-case. A large Retail firm has a lot of stores. DBA
maintains and updates the inventories of those stores in hash-partitions
based on store-no. As the inventory gets updated, the corresponding
partition receives the update and it goes like that..
Here all the partitions are going to get constantly updated. So no
partition can possibly become a read only partition. You have clearly called
it out in your para. i am just re-confirming on that. Or do you have
something for this in your soln.?

To my limited experience, most partition strategies are based on some form
of time-stamp. If the proposed soln. can cater to those, it has lot of
use-cases.

Thanks,
Gokul.

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gokulakannan Somasundaram (#4)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 13:06 +0530, Gokulakannan Somasundaram wrote:

a) This proposal would work for the kind of table organizations which
are currently partitioned and maintained based on some kind of
timestamp. Consider one of the use-case. A large Retail firm has a lot
of stores. DBA maintains and updates the inventories of those stores
in hash-partitions based on store-no. As the inventory gets updated,
the corresponding partition receives the update and it goes like
that..
Here all the partitions are going to get constantly updated.
So no partition can possibly become a read only partition. You have
clearly called it out in your para. i am just re-confirming on that.
Or do you have something for this in your soln.?

To my limited experience, most partition strategies are based on some
form of time-stamp. If the proposed soln. can cater to those, it has
lot of use-cases.

I don't think it would apply to an Inventory table. That is a current
state table.

It is designed for any large tables that would grow naturally over time
if we left them to do so. Solutions that it would work for:

- any Fact table where measurements/observations/events are accumulated
e.g.
Web Hits (any Internet events)
Call Detail Records
Sales
Security Events
Scientific Measurements
Process Control

- any Major Entity where new entities are created from a sequence
e.g.
Orders, OrderItems
Invoices
Shipments, Returns
most SCM/DCM events

It's not aimed at any particular benchmark, just real usage scenarios.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Simon Riggs (#1)

Re: Dynamic Partitioning using Segment Visibility Maps

Simon Riggs wrote:

We would keep a dynamic visibility map at *segment* level, showing which
segments have all rows as 100% visible. No freespace map data would be
held at this level.

Small dumb-user question.

I take it you've considered some more flexible consecutive-run-of-blocks
unit of flagging rather than file-segments. That obviously complicates
the tracking but means you can cope with infrequent updates as well as
mark most of the "most recent" segment for log-style tables.

--
Richard Huxton
Archonet Ltd

simon@2ndQuadrant.com

over 18 years ago

In reply to: Richard Huxton (#6)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 10:22 +0000, Richard Huxton wrote:

Simon Riggs wrote:

We would keep a dynamic visibility map at *segment* level, showing which
segments have all rows as 100% visible. No freespace map data would be
held at this level.

Small dumb-user question.

I take it you've considered some more flexible consecutive-run-of-blocks
unit of flagging rather than file-segments. That obviously complicates
the tracking but means you can cope with infrequent updates as well as
mark most of the "most recent" segment for log-style tables.

I'm writing the code to abstract that away, so yes.

Now you mention it, it does seem straightforward to have a table storage
parameter for partition size, which defaults to 1GB. The partition size
is simply a number of consecutive blocks, as you say.

The smaller the partition size the greater the overhead of managing it.
Also I've been looking at read-only tables and compression, as you may
know. My idea was that in the future we could mark segments as either
- read-only
- compressed
- able to be shipped off to hierarchical storage

Those ideas work best if the partitioning is based around the physical
file sizes we use for segments.

Changing the partition size would simply reset the visibility map for
that table, in its easiest implementation.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Richard Huxton

dev@archonet.com

over 18 years ago

In reply to: Simon Riggs (#7)

Re: Dynamic Partitioning using Segment Visibility Maps

Simon Riggs wrote:

On Fri, 2008-01-04 at 10:22 +0000, Richard Huxton wrote:

Simon Riggs wrote:

We would keep a dynamic visibility map at *segment* level, showing which
segments have all rows as 100% visible. No freespace map data would be
held at this level.

Small dumb-user question.

I take it you've considered some more flexible consecutive-run-of-blocks
unit of flagging rather than file-segments. That obviously complicates
the tracking but means you can cope with infrequent updates as well as
mark most of the "most recent" segment for log-style tables.

I'm writing the code to abstract that away, so yes.

Now you mention it, it does seem straightforward to have a table storage
parameter for partition size, which defaults to 1GB. The partition size
is simply a number of consecutive blocks, as you say.

The smaller the partition size the greater the overhead of managing it.

Oh, obviously, but with smaller partition sizes this also becomes useful
for low-end systems as well as high-end ones. Skipping 80% of a seq-scan
on a date-range query is a win for even small (by your standards)
tables. I shouldn't be surprised if the sensible-number-of-partitions
remained more-or-less constant as you scaled the hardware, but the
partition size grew.

Also I've been looking at read-only tables and compression, as you may
know. My idea was that in the future we could mark segments as either
- read-only
- compressed
- able to be shipped off to hierarchical storage

Those ideas work best if the partitioning is based around the physical
file sizes we use for segments.

I can see why you've chosen file segments. It certainly makes things easier.

Hmm - thinking about the date-range scenario above, it occurs to me that
for seq-scan purposes the correct partition size depends upon the
data value you are interested in. What I want to know is what blocks Jan
07 covers (or rather what blocks it doesn't) rather than knowing blocks
1-9999999 cover 2005-04-12 to 2007-10-13. Of course that means that
you'd eventually want different partition sizes tracking visibility for
different columns (e.g. id, timestamp).

I suspect the same would be true for read-only/compressed/archived
flags, but I can see how they are tightly linked to physical files
(particularly the last two).

--
Richard Huxton
Archonet Ltd

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#1)

Re: Dynamic Partitioning using Segment Visibility Maps

Hello Simon,

Simon Riggs wrote:

I've come
up with an alternative concept to allow us to discuss the particular
merits of each. ISTM that this new proposal has considerable potential.

Hm.. interesting idea.

If we were able to keep track of which sections of a table are now
read-only then we would be able to record information about the data in
that section and use it to help solve queries. This is turning the
current thinking on its head: we could derive the constraints from the
data, rather than placing the data according to the constraints. That
would be much more natural: load data into the table and have the system
work out the rest.

Yeah, but that's also the most limiting factor of your approach: it
covers only horizontal partitioning by time (or to be more precise, by
columns which are very likely to increase or decrease with time). All
other columns will very likely contain values from the full range of
possible values.

As you have pointed out, that might be a very frequent use case. I can't
argue about that, however, I think it's important to be well aware of
that limitation.

Other scans types might also use segment exclusion, though this would
only be useful for scans retrieving many rows, otherwise the overhead of
segment exclusion might not be worthwhile.

Uh.. the overhead of checking against min/max values doesn't seem that
big to me.

I rather think the gain for index scans would be prohibitively small,
because (given frequent enough vacuuming) an index scan shouldn't return
many pointers to tuples in segments which could be optimized away by
segment exclusion.

If we collect data for all columns then many of our implicit constraints
would be useless. e.g. if a column only has a few values and these are
present in all segments. Matching our predicate against all constraints
would be expensive, so we must weed out poor constraints. We would do
this by removing any constraint that overlapped more than 10% of other
segments. Various heuristics would likely need to be discovered during
development to make this work without resorting to manual commands.

Uh, well, that's about the limitation I've pointed out above. But is it
really worth maintaining statistics about overlapping values and
removing min/max checks for certain columns?

It would save you the min/max check per segment and scan, but cost
maintaining the statistics and checking against the statistics once per
scan. AFAICS the block with the min/max tuple per segment will often
remain cached anyway... dunno.

Noting which segments are read-only
-----------------------------------

Everything so far has relied upon our ability to note which segments of
a table are read-only. We could do this in two ways

1) have the system automatically keep track of non-changing data
2) have the DBA issue a command to "mark" a segment as read-only now

Probably a combination of the two is better, so we have three states for
segments
- READ_WRITE_ALLOWED
- EFFECTIVE_READ_ONLY (set by 1 only)
- MARKED_READ_ONLY (set by 2 only)

Having said that I want to concentrate on (1), though consider the other
one also in case requested by reviewers.

Hm.. AFAICT, horizontal partitioning often serves manageability, which
is quite limited having all data in one table (you can't move a single
segment to a different tablespace). Thus I think option 2 is pretty
constrained is usability. What would the DBA gain by setting a segment
to read only? How does the DBA figure out, in which segment a tuple is
stored in (so she can decide to mark it read only)?

The user may also wish to clear down very old data, so allowing DELETEs
can ensure the user can still remove old data from the table. By
carefully choosing the values to be deleted, a whole segment can be
deleted and then returned to the FSM.

Oh, yeah, that sounds like a good optimization. Bulk deletes, yummie!

This proposal offers many of the advantages of the earlier Visibility
Map proposal, yet without major changes to heap structure. Since the
segment-level visibility map is more granular it would only be 80% as
effective as the more detailed block-level map. Having said that, the
bookkeeping overheads will also be considerably reduced, so it does seem
a good joint approach. It also allows freezing to be handled fully,
which was a problem with the original visibility map proposal. WAL
logging visibility map changes is also now much easier.

I generally agree, although I'm somewhat dubious about the 80% factor.

My thoughts have been targeted directly at partitioning, yet I have to
admit that this idea overlaps, and in my world view, replaces the
Visibility Map proposal. I very much like the name Visibility Map
though.

I would even say, that partitioning is somewhat of a misleading term
here, because it normally allows the DBA to decide on where to split.

Given that we are operating on segments here, to which the DBA has very
limited information and access, I prefer the term "Segment Exclusion". I
think of that as an optimization of sequential scans on tables with the
above mentioned characteristics.

If we do need to differentiate between the two proposals, we can refer
to this one as the Segment Visibility Map (SVM).

I'm clearly in favor of separating between the two proposals. SVM is a
good name, IMHO.

We can handle select count(*) by scanning the non-100% visible segments
of a table, then adding the stored counts for each segment to get a
final total. Not sure if its really worth doing, but it does sound like
an added bonus.

Yup, sounds tempting. Although it's contrary to Postgres position so
far. And one could argue that you'd have to maintain not only count(),
but also avg(), sum(), etc... as well for all tuples in the 100% visible
segment.

There would be additional complexity in selectivity estimation and plan
costing. The above proposal allows dynamic segment exclusion, which
cannot be assessed at planning time anyway, so suggestions welcome...

Hm.. that looks like a rather bad downside of an executor-only optimization.

Comparison with other Partitioning approaches
---------------------------------------------

Once I realised this was possible in fairly automatic way, I've tried
hard to keep away from manual overrides, commands and new DDL.

Declarative partitioning is a big overhead, though worth it for large
databases. No overhead is *much* better though.

This approach to partitioning solves the following challenges
- allows automated table extension, so works automatically with Slony
- responds dynamically to changing data
- allows stable functions, nested loop joins and parametrised queries
- allows RI via SHARE locks
- avoids the need for operator push-down through Append nodes
- allows unique indexes
- allows both global indexes (because we only have one table)
- allows advanced planning/execution using read-only/visible data
- works naturally with synchronous scans and buffer recycling

All of the above are going to take considerably longer to do in any of
the other ways I've come up with so far...

I fully agree. But as I tried to point out above, the gains in
manageability from Segment Exclusion are also pretty close to zero. So
I'd argue they only fulfill parts of the needs for general horizontal
partitioning.

This technique would be useful for any table with historical data keyed
by date or timestamp. It would also be useful for data where a
time-of-insert component is implicit, such as many major entity tables
where the object ids are assigned by a sequence. e.g. an Orders table
with an OrderId as PK. Once all orders placed in a period have been
shipped/resolved/closed then the segments will be marked read-only.

Agreed. Just a minor note: I find "marked read-only" too strong, as it
implies an impossibility to write. I propose speaking about mostly-read
segments, or optimized for reading or similar.

Its not really going to change the code path much for small tables, yet
seems likely to work reasonably well for large tables of all shapes and
sizes.

That sounds a bit too optimistic to me. For Segment Exclusion, it takes
only *one* tuple to enlarge the min/max range dramatically in any
direction. So it's not the overall correlation between column values and
storage location, but rather only the min/max column values which
matter. Have you ever checked, how well these min/max values correlate
with the segment number?

Pretty much the same argument applies to SVM: an update to only one
tuple in a segment is enough to remove the optimization for reading
(EFFECTIVE_READ_ONLY property) for the segment. The assumption here
(being that updates happen mostly to newer segments) is not quite the
same as above.

Maybe a combination with CLUSTERing would be worthwhile? Or even
enforced CLUSTERing for the older segments?

If a segment is being updated, we leave it alone, and maybe never
actually set the visibility map at all. So overall, this idea seems to
cover the main use case well, yet with only minimal impact on the
existing use cases we support.

Yup.

As before, I will maintain this proposal on the PG developer Wiki, once
we get down to detailed design.

Cool.

Thanks for working out yet another great proposal. I hope to have been
of help with my questions and remarks. ;-)

Regards

Markus

#10

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#5)

Re: Dynamic Partitioning using Segment Visibility Maps

Hi,

Simon Riggs wrote:

- any Fact table where measurements/observations/events are accumulated
e.g.
Web Hits (any Internet events)
Call Detail Records
Sales
Security Events
Scientific Measurements
Process Control

- any Major Entity where new entities are created from a sequence
e.g.
Orders, OrderItems
Invoices
Shipments, Returns
most SCM/DCM events

...and only changed very seldom after a while, I would add. Because
changing an old tuple would invalidate the optimization for the affected
segment.

That's why this optimization can't help for inventory tables, where an
id might correlate with time and storage location, but writing access
doesn't correlate with storage location (segment number) and time.

Regards

Markus

#11

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#7)

Re: Dynamic Partitioning using Segment Visibility Maps

Hi,

Simon Riggs wrote:

The smaller the partition size the greater the overhead of managing it.
Also I've been looking at read-only tables and compression, as you may
know. My idea was that in the future we could mark segments as either
- read-only
- compressed
- able to be shipped off to hierarchical storage

Those ideas work best if the partitioning is based around the physical
file sizes we use for segments.

As much as I'd like this simplification.. But I'm still thinking of
these segments as an implementation detail of Postgres, and not
something the user should have to deal with.

Allowing the DBA to move segments to a different table space and giving
him the possibility to check which tuples are in which segment seems
awkward from a users perspective, IMO.

Regards

Markus

#12

simon@2ndQuadrant.com

over 18 years ago

In reply to: Markus Wanner (#9)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 13:29 +0100, Markus Schiltknecht wrote:

Given that we are operating on segments here, to which the DBA has very
limited information and access, I prefer the term "Segment Exclusion". I
think of that as an optimization of sequential scans on tables with the
above mentioned characteristics.

If we do need to differentiate between the two proposals, we can refer
to this one as the Segment Visibility Map (SVM).

I'm clearly in favor of separating between the two proposals. SVM is a
good name, IMHO.

OK, I'll refer to this as proposal as SVM.

There would be additional complexity in selectivity estimation and plan
costing. The above proposal allows dynamic segment exclusion, which
cannot be assessed at planning time anyway, so suggestions welcome...

Hm.. that looks like a rather bad downside of an executor-only optimization.

I think that's generally true. We already have that problem with planned
statements and work_mem, for example, and parameterised query planning
is a difficult problem. Stable functions are already estimated at plan
time, so we hopefully should be getting that right. I don't see any show
stoppers here, just more of the usual problems of query optimization.

Comparison with other Partitioning approaches
---------------------------------------------

Once I realised this was possible in fairly automatic way, I've tried
hard to keep away from manual overrides, commands and new DDL.

Declarative partitioning is a big overhead, though worth it for large
databases. No overhead is *much* better though.

This approach to partitioning solves the following challenges
- allows automated table extension, so works automatically with Slony
- responds dynamically to changing data
- allows stable functions, nested loop joins and parametrised queries
- allows RI via SHARE locks
- avoids the need for operator push-down through Append nodes
- allows unique indexes
- allows both global indexes (because we only have one table)
- allows advanced planning/execution using read-only/visible data
- works naturally with synchronous scans and buffer recycling

All of the above are going to take considerably longer to do in any of
the other ways I've come up with so far...

I fully agree. But as I tried to point out above, the gains in
manageability from Segment Exclusion are also pretty close to zero. So
I'd argue they only fulfill parts of the needs for general horizontal
partitioning.

Agreed.

My focus for this proposal wasn't manageability, as it had been in other
recent proposals. I think there are some manageability wins to be had as
well, but we need to decide what sort of partitioning we want/need
first.

So in the case of SVM, enhanced manageability is really a phase 2 thing.

Plus, you can always combine a design with constraint and segment
exclusion.

Maybe a combination with CLUSTERing would be worthwhile? Or even
enforced CLUSTERing for the older segments?

I think there's merit in Heikki's maintain cluster order patch and that
should do an even better job of maintaining locality.

Thanks for detailed comments. I'll do my best to include all of the
viewpoints you've expressed as the design progresses.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#13

ajs@crankycanuck.ca

over 18 years ago

In reply to: Markus Wanner (#9)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, Jan 04, 2008 at 01:29:55PM +0100, Markus Schiltknecht wrote:

Agreed. Just a minor note: I find "marked read-only" too strong, as it
implies an impossibility to write. I propose speaking about mostly-read
segments, or optimized for reading or similar.

I do want some segments to be _marked_ read-only: I want attempted writes to
them to _fail_.

#14

simon@2ndQuadrant.com

over 18 years ago

In reply to: Andrew Sullivan (#13)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 13:06 -0500, Andrew Sullivan wrote:

On Fri, Jan 04, 2008 at 01:29:55PM +0100, Markus Schiltknecht wrote:

Agreed. Just a minor note: I find "marked read-only" too strong, as it
implies an impossibility to write. I propose speaking about mostly-read
segments, or optimized for reading or similar.

I do want some segments to be _marked_ read-only: I want attempted writes to
them to _fail_.

I think Markus thought that we would mark them read only automatically,
which was not my intention. I believe its possible to have this in a way
that will make you both happy. Some more explanation:

There would be three different states for a segment:
1. read write
2. "optimized for reading", as Markus says it
3. marked read only by explicit command

Transition 1 -> 2 is by autovacuum under the SVM proposal, transition 2
-> 3 is by user command only. So throwing an ERROR is acceptable for
segments in state 3.

I came up with a complex scheme for going from 1 -> 3 previously, but I
don't think its needed any longer (for this, at least). It's trivial to
go from 2 -> 3 using an ALTER TABLE statement, along the lines of ALTER
TABLE .... WHERE ....

Files that are completely in state 3 can then be archived by a
hierarchical storage manager without problem.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#15

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#14)

Re: Dynamic Partitioning using Segment Visibility Maps

Hi,

Simon Riggs wrote:

On Fri, 2008-01-04 at 13:06 -0500, Andrew Sullivan wrote:

On Fri, Jan 04, 2008 at 01:29:55PM +0100, Markus Schiltknecht wrote:

Agreed. Just a minor note: I find "marked read-only" too strong, as it
implies an impossibility to write. I propose speaking about mostly-read
segments, or optimized for reading or similar.

Hm.. yeah, after rereading, I realize that I've mixed up states no. 2
and 3 here, sorry.

I do want some segments to be _marked_ read-only: I want attempted writes to
them to _fail_.

Well, I can see use cases for marking tuples or complete relations as
read only. But segments?

I'm still puzzled about how a DBA is expected to figure out which
segments to mark. Simon, are you assuming we are going to pass on
segment numbers to the DBA one day?

If not, a more user friendly command like "MARK READ ONLY WHERE date <=
2006" would have to move tuples around between segments, so as to be
able to satisfy the split point exactly, right?

I think Markus thought that we would mark them read only automatically,
which was not my intention. I believe its possible to have this in a way
that will make you both happy. Some more explanation:

There would be three different states for a segment:
1. read write
2. "optimized for reading", as Markus says it
3. marked read only by explicit command

Thanks for clarification.

Regards

Markus

#16

ajs@crankycanuck.ca

over 18 years ago

In reply to: Markus Wanner (#15)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, Jan 04, 2008 at 10:26:54PM +0100, Markus Schiltknecht wrote:

I'm still puzzled about how a DBA is expected to figure out which
segments to mark.

I think that part might be hand-wavy still. But once this facility is
there, what's to prevent the current active segment (and the rest) from also
getting this mark, which would mean "the table is read only"?

#17

simon@2ndQuadrant.com

over 18 years ago

In reply to: Markus Wanner (#15)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 22:26 +0100, Markus Schiltknecht wrote:

I'm still puzzled about how a DBA is expected to figure out which
segments to mark. Simon, are you assuming we are going to pass on
segment numbers to the DBA one day?

No Way!

That would stop Richard's idea to make the segment stride configurable,
apart from being a generally ugly thing.

If not, a more user friendly command like "MARK READ ONLY WHERE date <=
2006" would have to move tuples around between segments, so as to be
able to satisfy the split point exactly, right?

Yes, just a simple WHERE clause that we can translate into segments
under the covers. It would be an alter table, so we get an exclusive
lock.

ALTER TABLE foo SET READ ONLY WHERE ....

possibly with a few restrictions on the WHERE clause. Anyway this is
just futures and dreams, so far, so lets just say something like that is
possible in the future and work out more when we pass the first few
hurdles.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#18

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Simon Riggs (#17)

Re: Dynamic Partitioning using Segment Visibility Maps

On Friday 04 January 2008 17:01, Simon Riggs wrote:

On Fri, 2008-01-04 at 22:26 +0100, Markus Schiltknecht wrote:

I'm still puzzled about how a DBA is expected to figure out which
segments to mark. Simon, are you assuming we are going to pass on
segment numbers to the DBA one day?

No Way!

That would stop Richard's idea to make the segment stride configurable,
apart from being a generally ugly thing.

If not, a more user friendly command like "MARK READ ONLY WHERE date <=
2006" would have to move tuples around between segments, so as to be
able to satisfy the split point exactly, right?

Yes, just a simple WHERE clause that we can translate into segments
under the covers. It would be an alter table, so we get an exclusive
lock.

ALTER TABLE foo SET READ ONLY WHERE ....

possibly with a few restrictions on the WHERE clause. Anyway this is
just futures and dreams, so far, so lets just say something like that is
possible in the future and work out more when we pass the first few
hurdles.

Not to be negative, but istm how this feature would be managed is as important
as the bits under the hood. Or at least we have to believe there will be
some practical way to manage this, which as of yet I am skeptical.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#19

simon@2ndQuadrant.com

over 18 years ago

In reply to: Robert Treat (#18)

Re: Dynamic Partitioning using Segment Visibility Maps

On Fri, 2008-01-04 at 22:31 -0500, Robert Treat wrote:

Not to be negative, but istm how this feature would be managed is as important
as the bits under the hood.

Agreed. On this part of the thread, we've been discussing an extension
to the basic proposal, which is why I have not been concentrating there.

Core management wise, the basic proposal showed how we would be able to
have VACUUM run much faster than before and how DELETE will also be
optimised naturally by this approach. Loading isn't any slower than it
is now; loading does need some work, but that's another story.

Or at least we have to believe there will be
some practical way to manage this, which as of yet I am skeptical.

Skepticism is OK, but I'd like to get your detailed thoughts on this.
I've been an advocate of the multi-tables approach now for many years,
so I don't expect everybody to switch their beliefs on my say-so
overnight. Let me make a few more comments in this area:

The main proposal deliberately has few, if any, knobs and dials. That's
a point of philosophy that I've had views on previously: my normal
stance is that we need some knobs to allow the database to be tuned to
individual circumstances.

In this case, partitioning is way too complex to administer effectively
and requires application changes that make it impossible to use for
packaged applications. The latest Oracle TPC-H benchmark uses 10 pages
of DDL to set it up and if I can find a way to avoid that, I'd recommend
it to all. I do still want some knobs and dials, just not 10 pages
worth, though I'd like yours and others' guidance on what those should
be. Oracle have been responding to feedback with their new interval
partitioning, but its still a multi-table approach in essence.

My observation of partitioned databases is that they all work
beautifully at the design stage, but problems emerge over time. A
time-based range partitioned table can often have different numbers of
rows per partition, giving inconsistent response times. A
height-balanced approach where we make the partitions all the same size,
yet vary the data value boundaries will give much more consistent query
times and can be completely automated much more easily.

The SVM concept doesn't cover everything that you can do with
partitioning, but my feeling is it covers the main use cases well. If
that's not true, in broad strokes or in the detail, then we need to
uncover that. Everybody's help in doing that is appreciated, whatever
the viewpoint and whatever the outcome.

It's probably worth examining existing applications to see how well they
would migrate to segmented tables approach. The following query will
analyse one column of a table to produce a list of boundary values,
given a segment size of 131072 blocks (1 GB).

select
substr(ctid::text,2,strpos(ctid::text,',')-2)::integer/131072 as seg,
min(PrimaryKey), max(PrimaryKey)
from bigtable
group by seg;

We should be able to see whether this works for existing use cases, or
not fairly easily.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#20

markus@bluegap.ch

over 18 years ago

In reply to: Andrew Sullivan (#16)

Re: Dynamic Partitioning using Segment Visibility Maps

Andrew Sullivan wrote:

On Fri, Jan 04, 2008 at 10:26:54PM +0100, Markus Schiltknecht wrote:

I'm still puzzled about how a DBA is expected to figure out which
segments to mark.

I think that part might be hand-wavy still. But once this facility is
there, what's to prevent the current active segment (and the rest) from also
getting this mark, which would mean "the table is read only"?

Well, sure, marking *all* segments read-only is pretty easy. But that
was not quite my point.

Regards

Markus

#21

tomas@tuxteam.de

over 18 years ago

In reply to: Simon Riggs (#19)

#22

markus@bluegap.ch

over 18 years ago

In reply to: tomas@tuxteam.de (#21)

#23

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#19)

#24

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Markus Wanner (#23)

#25

markus@bluegap.ch

over 18 years ago

In reply to: Robert Treat (#24)

#26

gokul007@gmail.com

over 18 years ago

In reply to: tomas@tuxteam.de (#21)

#27

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Markus Wanner (#25)

#28

tomas@tuxteam.de

over 18 years ago

In reply to: Gokulakannan Somasundaram (#26)

#29

gokul007@gmail.com

over 18 years ago

In reply to: Robert Treat (#27)

#30

gokul007@gmail.com

over 18 years ago

In reply to: tomas@tuxteam.de (#28)

#31

markus@bluegap.ch

over 18 years ago

In reply to: Gokulakannan Somasundaram (#26)

#32

markus@bluegap.ch

over 18 years ago

In reply to: Robert Treat (#27)

#33

gokul007@gmail.com

over 18 years ago

In reply to: Markus Wanner (#31)

#34

xzilla@users.sourceforge.net

over 18 years ago

In reply to: Markus Wanner (#32)

#35

Csaba Nagy

nagy@ecircle-ag.com

over 18 years ago

In reply to: Simon Riggs (#1)

#36

markus@bluegap.ch

over 18 years ago

In reply to: Csaba Nagy (#35)

#37

Csaba Nagy

nagy@ecircle-ag.com

over 18 years ago

In reply to: Markus Wanner (#36)

#38

markus@bluegap.ch

over 18 years ago

In reply to: Csaba Nagy (#37)

#39

Csaba Nagy

nagy@ecircle-ag.com

over 18 years ago

In reply to: Markus Wanner (#38)

#40

ajs@crankycanuck.ca

over 18 years ago

In reply to: Markus Wanner (#25)

#41

markus@bluegap.ch

over 18 years ago

In reply to: Andrew Sullivan (#40)

#42

ajs@crankycanuck.ca

over 18 years ago

In reply to: Markus Wanner (#41)

#43

Ron Mayer

rm_pg@cheapcomplexdevices.com

over 18 years ago

In reply to: Andrew Sullivan (#42)

#44

markus@bluegap.ch

over 18 years ago

In reply to: Andrew Sullivan (#42)

#45

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Andrew Sullivan (#42)

#46

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 18 years ago

In reply to: Bruce Momjian (#45)

#47

ajs@crankycanuck.ca

over 18 years ago

In reply to: Markus Wanner (#44)

#48

ajs@crankycanuck.ca

over 18 years ago

In reply to: Bruce Momjian (#45)

#49

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#1)

#50

simon@2ndQuadrant.com

over 18 years ago

In reply to: Robert Treat (#27)

#51

simon@2ndQuadrant.com

over 18 years ago

In reply to: Csaba Nagy (#35)

#52

simon@2ndQuadrant.com

over 18 years ago

In reply to: Markus Wanner (#31)

#53

simon@2ndQuadrant.com

over 18 years ago

In reply to: Markus Wanner (#38)

#54

markus@bluegap.ch

over 18 years ago

In reply to: Simon Riggs (#52)

#55

simon@2ndQuadrant.com

over 18 years ago

In reply to: Markus Wanner (#23)

#56

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#49)

#57

Chris Browne

cbbrowne@acm.org

over 18 years ago

In reply to: Simon Riggs (#1)

#58

Ron Mayer

rm_pg@cheapcomplexdevices.com

over 18 years ago

In reply to: Chris Browne (#57)

#59

swm@linuxworld.com.au

over 18 years ago

In reply to: Chris Browne (#57)

#60

Chris Browne

cbbrowne@acm.org

over 18 years ago

In reply to: Simon Riggs (#1)

#61

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#59)

#62

simon@2ndQuadrant.com

over 18 years ago

In reply to: Bruce Momjian (#45)

#63

swm@linuxworld.com.au

over 18 years ago

In reply to: Chris Browne (#60)

#64

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#61)

#65

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#1)

#66

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#56)

#67

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#66)

#68

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#66)

#69

Ron Mayer

rm_pg@cheapcomplexdevices.com

over 18 years ago

In reply to: Ron Mayer (#58)

#70

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#67)

#71

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#68)

#72

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#70)

#73

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#71)

#74

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#72)

#75

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#74)

#76

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#75)

#77

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#65)

#78

simon@2ndQuadrant.com

over 18 years ago

In reply to: Gavin Sherry (#76)

#79

Zeugswetter Andreas ADI SD

simon@2ndQuadrant.com

over 18 years ago

In reply to: Ron Mayer (#43)

#80

Andreas.Zeugswetter@s-itsolutions.at

over 18 years ago

In reply to: Simon Riggs (#75)

#81

simon@2ndQuadrant.com

over 18 years ago

In reply to: Zeugswetter Andreas ADI SD (#80)

#82

swm@linuxworld.com.au

over 18 years ago

In reply to: Simon Riggs (#78)

#83