TOASTing smaller things

Started by Chris Browneabout 19 years ago21 messageshackers
Jump to latest
#1Chris Browne
cbbrowne@acm.org

In some of our applications, we have cases where it would be very nice
if we could activate TOAST at some sort of lower threshold than the
usual 2K that is true now. Let me note the current code that controls
the threshold:

/*
* These symbols control toaster activation. If a tuple is larger than
* TOAST_TUPLE_THRESHOLD, we will try to toast it down to no more than
* TOAST_TUPLE_TARGET bytes. Both numbers include all tuple header overhead
* and between-fields alignment padding, but we do *not* consider any
* end-of-tuple alignment padding; hence the values can be compared directly
* to a tuple's t_len field. We choose TOAST_TUPLE_THRESHOLD with the
* knowledge that toast-table tuples will be exactly that size, and we'd
* like to fit four of them per page with minimal space wastage.
*
* The numbers need not be the same, though they currently are.
*
* Note: sizeof(PageHeaderData) includes the first ItemId, but we have
* to allow for 3 more, if we want to fit 4 tuples on a page.
*/
#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / 4)

We have cases where we're storing XML message information which is
near the 0.5K mark, that being the case, tuples virtually never get
TOASTed.

somesystem=# select min(length(xml)), max(length(xml)), avg(length(xml)), stddev(length(xml)) from table_with_xml;
min | max | avg | stddev
-----+------+----------------------+------------------
244 | 2883 | 651.6900720788174376 | 191.602077911138
(1 row)

I can see four controls as being pretty plausible:

1. Compile time...

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

That's obviously cheapest to the DB engine.

I just made this change to a checkout of CVS HEAD, and it readily
survived a regression test.

2. GUC value for TOAST_DENOMINATOR

Do the above, but with the added detail that TOAST_DENOMINATOR refers
to a GUC value.

I think I could probably make this change; the principle remains much
the same as with #1.

3. GUC value for TOAST_TUPLE_THRESHOLD

This probably has to get modified to the nearest feasible value,
modulo alignment; it's not all that different from #1 or #2.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

At present, the 4 values are essentially advisory; columns get TOASTed
if the column permits EXTENDED storage, but that only occurs if the
size is greater than TOAST_TUPLE_THRESHOLD.

If the new value was chosen, the column would *always* get stored as
TOAST.

Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
#3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

Question:

Which of these sounds preferable?
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html&gt;

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Browne (#1)
Re: TOASTing smaller things

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.

regards, tom lane

#3Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#2)
Re: TOASTing smaller things

On 3/21/2007 2:05 PM, Tom Lane wrote:

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Not necessarily. A flag in Relation telling if the table has any column
marked like that could be set while constructing the relcache entry.

Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.

Certainly.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#4Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#3)
Re: TOASTing smaller things

Is this a TODO?

---------------------------------------------------------------------------

Jan Wieck wrote:

On 3/21/2007 2:05 PM, Tom Lane wrote:

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Not necessarily. A flag in Relation telling if the table has any column
marked like that could be set while constructing the relcache entry.

Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.

Certainly.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Chris Browne
cbbrowne@acm.org
In reply to: Chris Browne (#1)
Re: TOASTing smaller things

tgl@sss.pgh.pa.us (Tom Lane) writes:

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since
the original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

OK, then maybe some refinement was needed. That seemed too easy.

Mind you, the problem seems to me to be that TOAST_TUPLE_THRESHOLD is
not solely a threshold to compare things to (as done in
heapam.c/toasting.c), but gets reused to calculate
TOAST_MAX_CHUNK_SIZE. If the threshold was solely used as that,
alignment wouldn't matter.

FYI, I took a sample table and loaded it into the resulting 8.3
backend based on the us of the totally naive TOAST_DENOMINATOR; there
may be something off in the sizing of the chunks, but that does not
appear to have injured fidelity of the tuples I stored.

Vacuum output:

--- Production system (no TOASTing)
INFO:  "xml_log_table": found 0 removable, 1731329 nonremovable row versions in 175870 pages
DETAIL:  0 dead row versions cannot be removed yet.
--- In the 8.3 instance that did toast things...
INFO:  "xml_log_table": found 0 removable, 1730737 nonremovable row versions in 41120 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
610 pages contain useful free space.
0 pages are entirely empty.
CPU 1.08s/0.36u sec elapsed 14.94 sec.
INFO:  vacuuming "pg_toast.pg_toast_49194"
INFO:  index "pg_toast_49194_index" now contains 2303864 row versions in 6319 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.03u sec elapsed 1.71 sec.
INFO:  "pg_toast_49194": found 0 removable, 2303864 nonremovable row versions in 98191 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
514 pages contain useful free space.
0 pages are entirely empty.

Problem with alignment of TOAST_MAX_CHUNK_SIZE or not, I seem to be
getting the right results, and this nicely partitions the table into 2
chunks, one, with the non-XML data, that occupies 41K pages, and the
TOAST section storing those less-frequently-accessed columns. (There
is a size difference; the production instance has more empty space
since it sees active inserts + deletes.)

In all ways except for "strict hygenic correctness of code," this
accomplished what I was hoping.

If someone could make a round-off-safe calculation of
TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE that exposed the
denominator so it could be safely modified, that would be one step
ahead... I generally try not to make changes to the core, so I'll try
to avoid that...

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Ah, I see. I infer from that that the code starts by checking to see
if the tuple size is > TOAST_TUPLE_THRESHOLD, and only starts
rummaging through TOAST infrastructure if the tuple is big enough.

In that case, "TOAST by default" becomes rather a nonstarter, I agree.
In the application context I'm thinking of, one table out of ~80 is a
"TOAST candidate;" making access to the other 79 slower would not be
of benefit.

(Aside: I'll link to Simon Rigg's related note, as well as to the item
on the TODO list...)
<http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php&gt;
<http://www.postgresql.org/docs/faqs.TODO.html&gt;
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html&gt;

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Chris Browne (#1)
Re: TOASTing smaller things

On Wed, Mar 21, 2007 at 12:37:36PM -0400, Chris Browne wrote:

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

At present, the 4 values are essentially advisory; columns get TOASTed
if the column permits EXTENDED storage, but that only occurs if the
size is greater than TOAST_TUPLE_THRESHOLD.

If the new value was chosen, the column would *always* get stored as
TOAST.

Rather than a hard and fast limit of 0, why not allow defining a size
threshold? And while we're at it, how about a size threshold for when to
try compressing, too?

Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
#3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

Question:

Which of these sounds preferable?

1 and 2 (variations on how to set the denominator) sound completely
ugly. Trying to minimize wasted space in a toast table is great for a
default, but exposing something like that to the users via any kind of
setting seems way to obtuse.

#3 (GUC for number of bytes) may not make sense for performance reasons,
as Tom mentioned. I'm hoping that it would be easy to check either
pg_class or pg_attribute to see if a table/column has non-standard
toast/compression limits.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#7Luke Lonergan
llonergan@greenplum.com
In reply to: Bruce Momjian (#4)
Re: TOASTing smaller things

I advocate the following:

- Enable specification of TOAST policy on a per column basis

As a first step, then:

- Enable vertical partitioning of tables using per-column specification of
storage policy.

- Luke

On 3/21/07 1:12 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

Show quoted text

Is this a TODO?

---------------------------------------------------------------------------

Jan Wieck wrote:

On 3/21/2007 2:05 PM, Tom Lane wrote:

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from
TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Not necessarily. A flag in Relation telling if the table has any column
marked like that could be set while constructing the relcache entry.

Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.

Certainly.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#8Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Luke Lonergan (#7)
Re: TOASTing smaller things

Luke Lonergan wrote:

I advocate the following:

- Enable specification of TOAST policy on a per column basis

As a first step, then:

- Enable vertical partitioning of tables using per-column specification of
storage policy.

Wouldn't it be enough to enable having the toast table on a different
table space?

Regards,
Andreas

#9Luke Lonergan
llonergan@greenplum.com
In reply to: Andreas Pflug (#8)
Re: TOASTing smaller things

Andreas,

On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

Wouldn't it be enough to enable having the toast table on a different
table space?

Yes, but the ultimate goal would allow the allocation of a storage mechanism
that is unique to each column. The most frequently used mechanism for our
customers would likely be to have each column associated with it's own
internal relation (like a separate TOAST table), which puts each column into
it's own dense page storage.

Beside the advantages of separating out keys columns from data columns,
compression and encryption approaches that use column packed data are much
more effective.

As Tom points out there are complications WRT update, access, etc that need
to be worked out to support this, but it's an important capability to have
IMO.

- Luke

#10Chris Browne
cbbrowne@acm.org
In reply to: Andreas Pflug (#8)
Re: TOASTing smaller things

llonergan@greenplum.com ("Luke Lonergan") writes:

Andreas,

On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

Wouldn't it be enough to enable having the toast table on a different
table space?

Yes, but the ultimate goal would allow the allocation of a storage mechanism
that is unique to each column. The most frequently used mechanism for our
customers would likely be to have each column associated with it's own
internal relation (like a separate TOAST table), which puts each column into
it's own dense page storage.

Beside the advantages of separating out keys columns from data columns,
compression and encryption approaches that use column packed data are much
more effective.

As Tom points out there are complications WRT update, access, etc that need
to be worked out to support this, but it's an important capability to have
IMO.

Hmm. Are you trying to do something sort of like CStore?

http://db.csail.mit.edu/projects/cstore/

That seems to have some relevant ideas...
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you *not* want to go today? "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html&gt;

#11Luke Lonergan
llonergan@greenplum.com
In reply to: Chris Browne (#10)
Re: TOASTing smaller things

Chris,

Hmm. Are you trying to do something sort of like CStore?

http://db.csail.mit.edu/projects/cstore/

That seems to have some relevant ideas...

I think "something like" is a good way to put it. As you know Cstore was a
prototype for Vertica and these are in the same class as SybaseIQ and
SandDB.

The huge drawback of the pure column approach is update/insert while query
is difficult if not impossible. I think there are hybrid approaches that
yield most, if not all of the benefits of the column store approach without
the disadvantages.

For instance, a bitmap index with index only access in a row-store may
outperform the column store on queries. Note the "index only access" part.
The next advantage of a column store is deep compression, preserved through
the executor access path - we can do this with selective vertical
partitioning using a page-segmented WAH compression similar to what we did
with bitmap index. Lastly, vectorization of the operators in the executor
can be implemented with vertical partitioning and an access path that does
projection before feeding the columns into the executor - this can be done
in Postgres with a cache-bypass method. Some of this requires working out
answers to the visibility challenges inherent to MVCC, but it's all possible
IMO.

So - under the guise of "TOASTing smaller things", it seems relevant to
think about vertical partitioning, perhaps making use of what's already in
Postgres as baby steps toward more advanced features.

- Luke

#12Hannu Krosing
hannu@tm.ee
In reply to: Luke Lonergan (#9)
Re: TOASTing smaller things

Ühel kenal päeval, N, 2007-03-22 kell 10:19, kirjutas Luke Lonergan:

Andreas,

On 3/22/07 9:40 AM, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

Wouldn't it be enough to enable having the toast table on a different
table space?

Yes, but the ultimate goal would allow the allocation of a storage mechanism
that is unique to each column. The most frequently used mechanism for our
customers would likely be to have each column associated with it's own
internal relation (like a separate TOAST table), which puts each column into
it's own dense page storage.

Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?

btw, it may be a good idea to have a verion of bizgresMPP which has
monetdb as partition db, if monetdb is as efficient as they tell it is .

Beside the advantages of separating out keys columns from data columns,
compression and encryption approaches that use column packed data are much
more effective.

As Tom points out there are complications WRT update, access, etc that need
to be worked out to support this, but it's an important capability to have
IMO.

- Luke

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#13Luke Lonergan
llonergan@greenplum.com
In reply to: Hannu Krosing (#12)
Re: TOASTing smaller things

Hi Hannu,

On 3/22/07 3:21 PM, "Hannu Krosing" <hannu@skype.net> wrote:

Maybe monetdb ( http://monetdb.cwi.nl/ ) can be of some inspiration ?

btw, it may be a good idea to have a verion of bizgresMPP which has
monetdb as partition db, if monetdb is as efficient as they tell it is .

Yep - I've talked this over with the MonetDB folks in the past. The major
benefits they observe are those same things we talk about here,
implementation of long loops for operators and de-abstraction of operations
like compare() when appropriate, say comparing two INT columns in a sort.

We can get many of those benefits without by "vectorizing" the executor of
PostgreSQL even without the full column partitioning. We're in the midst of
working some of those changes as we speak. Early indications are that we
see large performance gains from this approach. Note that the actual
instruction counts per row don't change, but the more effective use of L2 I
and D cache and superscaler instruction units on the CPU create the big
gains. The MonetDB people present some effective literature on this, but
the important gains mostly come from the "vectorization", not the operator
de-abstraction IMO, which is good news for us all.

- Luke

#14Bruce Momjian
bruce@momjian.us
In reply to: Luke Lonergan (#7)
Re: TOASTing smaller things

Luke Lonergan wrote:

I advocate the following:

- Enable specification of TOAST policy on a per column basis

As a first step, then:

- Enable vertical partitioning of tables using per-column specification of
storage policy.

How are these different from ALTER TABLE SET STORAGE? They need to be
more specific.

---------------------------------------------------------------------------

- Luke

On 3/21/07 1:12 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

Is this a TODO?

---------------------------------------------------------------------------

Jan Wieck wrote:

On 3/21/2007 2:05 PM, Tom Lane wrote:

Chris Browne <cbbrowne@acm.org> writes:

#define TOAST_DENOMINATOR 17
/* Use this as the divisor; current default behaviour falls from
TOAST_DENOMINATOR = 4 */

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I / TOAST_DENOMINATOR)

Given that you are quoting code that was demonstrably broken since the
original coding of TOAST up till a month or two back, "it passes
regression" is not adequate proof of "it's right". In fact I think
it's not right; you have not got the roundoff condition straight.

4. A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

FORCE_COMPRESSION, FORCE_EXTERNAL and FORCE_EXTERNAL_UNCOMPRESSED.

Anything along this line would require invoking the toaster on every
single tuple, since we'd always have to crawl through all the columns
to see if toasting was supposed to happen. No thanks.

Not necessarily. A flag in Relation telling if the table has any column
marked like that could be set while constructing the relcache entry.

Which of these sounds preferable?

It's a bit late in the cycle to be proposing any of these for 8.3.

Certainly.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Luke Lonergan
llonergan@greenplum.com
In reply to: Bruce Momjian (#14)
Re: TOASTing smaller things

Hi Bruce,

How about these:

- Allow specification of TOAST size threshold in bytes on a per column basis
- Enable storage of columns in separate TOAST tables
- Enable use of multi-row compression method(s) for TOAST tables

- Luke

On 3/26/07 5:39 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

Show quoted text

Luke Lonergan wrote:

I advocate the following:

- Enable specification of TOAST policy on a per column basis

As a first step, then:

- Enable vertical partitioning of tables using per-column specification of
storage policy.

How are these different from ALTER TABLE SET STORAGE? They need to be
more specific.

#16Bruce Momjian
bruce@momjian.us
In reply to: Luke Lonergan (#15)
Re: TOASTing smaller things

Luke Lonergan wrote:

Hi Bruce,

How about these:

- Allow specification of TOAST size threshold in bytes on a per column basis
- Enable storage of columns in separate TOAST tables
- Enable use of multi-row compression method(s) for TOAST tables

At this point I would be happy just to set the TOAST threshold to a
value defined as optimal, rather than as the most minimal use of TOAST
possible.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17Luke Lonergan
llonergan@greenplum.com
In reply to: Bruce Momjian (#16)
Re: TOASTing smaller things

Hi Bruce,

On 4/12/07 9:24 AM, "Bruce Momjian" <bruce@momjian.us> wrote:

Luke Lonergan wrote:

Hi Bruce,

How about these:

- Allow specification of TOAST size threshold in bytes on a per column basis
- Enable storage of columns in separate TOAST tables
- Enable use of multi-row compression method(s) for TOAST tables

At this point I would be happy just to set the TOAST threshold to a
value defined as optimal, rather than as the most minimal use of TOAST
possible.

I agree that's a good starting point, I guess I was thinking that was
already included in the work that Tom has been doing. If not, we can add a
TODO like this as a precursor to the ones above:

- Allow specification of TOAST size threshold (in bytes) on a per table
basis

- Luke

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Luke Lonergan (#17)
Re: TOASTing smaller things

"Luke Lonergan" <llonergan@greenplum.com> writes:

On 4/12/07 9:24 AM, "Bruce Momjian" <bruce@momjian.us> wrote:

At this point I would be happy just to set the TOAST threshold to a
value defined as optimal, rather than as the most minimal use of TOAST
possible.

I agree that's a good starting point, I guess I was thinking that was
already included in the work that Tom has been doing.

No. I put in the code needed to decouple toast tuple size from toasting
threshold, but I don't have the time or interest to run performance
tests to see whether there are better default values than the historical
quarter-page values. Someone should do that before 8.3 beta ...

If not, we can add a
TODO like this as a precursor to the ones above:

- Allow specification of TOAST size threshold (in bytes) on a per table
basis

I would suggest that *all* of those TODOs are premature in the absence
of experimental evidence about the effect of varying the parameters.
If we end up finding out that the existing settings are about right
anyway across a range of test cases, who needs more knobs? We've got
too many mostly-useless knobs already.

regards, tom lane

#19Luke Lonergan
llonergan@greenplum.com
In reply to: Tom Lane (#18)
Re: TOASTing smaller things

Hi Tom,

On 4/12/07 1:40 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I would suggest that *all* of those TODOs are premature in the absence
of experimental evidence about the effect of varying the parameters.
If we end up finding out that the existing settings are about right
anyway across a range of test cases, who needs more knobs? We've got
too many mostly-useless knobs already.

This set of TODOs is really about a binary change in behavior that data
warehouse users will employ to shift given columns into a separate storage
mechanism while preserving their schema. By contrast, the knob you describe
is about tuning the existing storage mechanism by offlining values that are
too large.

We can talk about these TODOs as a group using the name "vertical
partitioning" if that suits.

To demonstrate the effectiveness of vertical partitioning, we would write
queries that use the partitioned columns independently or as groups that
correlate with the storage mechanism.

The other demonstration applies to the use of compression techniques that
align with the column type(s) and operate across tuple boundaries within
pages. Examples there include the segmentation of fixed width types and
variable width types into separate page storage classes, which allows for
the application of different compression and/or representations on pages.

- Luke

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#18)
Re: TOASTing smaller things

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

I would suggest that *all* of those TODOs are premature in the absence
of experimental evidence about the effect of varying the parameters.
If we end up finding out that the existing settings are about right
anyway across a range of test cases, who needs more knobs? We've got
too many mostly-useless knobs already.

Isn't it obvious that the "right" value is going to depend extraordinarily
heavily on the precise usage pattern though?

A typical table with 100-byte columns which are normally read with the rest of
the columns, sometimes in sequential scans or updates, will find TOASTing them
counter-productive as it t urns those all into additional random access i/o
and extra inserts and deletes.

Another system with a dozen 100-byte columns that are virtually never accessed
and a handful of heavily-updated integer columns will benefit heavily from
TOAST as it changes the table from an awfully performing 5-6 tuple/page table
into a tremendously performing 100+ tuple/page table.

We're not ever going to be able to predict when data is being stored what
future usage pattern to expect, at least not realistically. The best we can
hope for is to find the tipping point at which the cost if we guess wrong is
some tolerable level of pain and set that to be the default and giving the
tools to the user to obtain the benefit in the cases where he knows it'll
help.

Or perhaps TOAST is the wrong kind of vertical partitioning for this. Perhaps
we should be using TOAST to deal with the "large datum" problem and have a
different looking tool entirely for the "vertical partitioning rarely used
columns" problem.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)