TOAST on indices

Started by Nonameover 25 years ago3 messages
#1Noname
JanWieck@t-online.de

For discussion:

First what the current implementation and the yet to be done
proposals do.

All varlena data types (text, char, varchar, arrays) will
finally be toastable. Every table that uses such types
will have a secondary relation to move off attributes.

The toaster allways tries to keep a main tuple small
enough so that at minimum 4 tuples fit into a block. One
had complained about, and I explain later why I think
it's a good decision anyway.

This strategy already covers most possible index problems. If
the main tuple fits into 2K after toasting, any combination
of attributes out of it will too. The only thing not covered
are functional indices.

In real world scenarios, indices are usually set up on small
key values. These are very likely to be kept plain in the
main tuple by the toaster, becuase it looks at the biggest
values first. So an index (built out of the values in the
main tuple after toasting) will also contain the plain
values. Thus, index scans will not require toast fetches in
turn. Except the indexed attribute had at some point a huge
value.

The current TOAST implementation hooks into the heap access
methods only. Automagically covering the index issues due to
the 2K approach. Fact is, that if more toast entries can get
produced during index inserts, we need to take care for them
during vacuum (the only place where index items get removed).
Alot of work just to support huge functional indices - IMHO
not worth the efford right now. Let's better get some
experience with the entire thing before going too far.

Why is it good to keep the main tuple below 2K? First because
of the above side effects for indices. Second, because in the
most likely case of small indexed attributes, more main
tuples (that must be fetched for the visibility checks
anyway) will fit into one block. That'll cause more blocks of
the relation to fit into the given shared memory buffer cache
and avoids I/O during index scans.

My latest tests load a 1.1M tree full of .html files into a
database. The result is a 140K heap plus 300K toast
relation. Without that 2K approach, the result is a 640K heap
plus 90K toastrel only. Since all compression is done on
single entries, it scales linear, so that a 1.1G tree will
result in a 140M heap plus 300M toastrel vs. a 640M heap plus
90M toastrel. No need to bechmark it - I know which strategy
wins.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#2Philip Warner
pjw@rhyme.com.au
In reply to: Noname (#1)
Re: TOAST on indices

At 20:42 4/07/00 +0200, Jan Wieck wrote:

So an index (built out of the values in the
main tuple after toasting) will also contain the plain
values. Thus, index scans will not require toast fetches in
turn. Except the indexed attribute had at some point a huge
value.

So that, for toasted attrs, the indexes will be no use for sorting. I agree
that in the majority of cases this is not a problem, but if the entire
tuple gets toasted because it is too large it becomes a problem.

I agree that this is only a problem if indexes are used in sorting, and may
not be a problem if one builds an index on 'substr(toastable-field,1,20)',
but I think you are suggesting not supporting functional indexes,
below...but maybe I've missed the point.

The current TOAST implementation hooks into the heap access
methods only. Automagically covering the index issues due to
the 2K approach. Fact is, that if more toast entries can get
produced during index inserts, we need to take care for them
during vacuum (the only place where index items get removed).
Alot of work just to support huge functional indices - IMHO
not worth the efford right now. Let's better get some
experience with the entire thing before going too far.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#3Noname
JanWieck@t-online.de
In reply to: Philip Warner (#2)
Re: TOAST on indices

Philip Warner wrote:

At 20:42 4/07/00 +0200, Jan Wieck wrote:

So an index (built out of the values in the
main tuple after toasting) will also contain the plain
values. Thus, index scans will not require toast fetches in
turn. Except the indexed attribute had at some point a huge
value.

So that, for toasted attrs, the indexes will be no use for sorting. I agree
that in the majority of cases this is not a problem, but if the entire
tuple gets toasted because it is too large it becomes a problem.

That ain't true, entirely. First of all, only single
attributes get toasted. Never a complete tuple but maybe all
of it's attributes (if it is a table with many, many
attributes or all of them are big).

If so, well, the sort might become damned slow. Assuming all
the rows selected have the attribute to sort on toasted, each
comparision will require two index scans (plus possibly
decompression) during the sort.

But tell me, do you know of real world DB installations where
indices on fields likely to be >1K exist? What is such an
index good for? Fast reverse lookup of 65536-bit RSA keys?

The system won't complain, nor will it bail out in such a
situation. That it won't behave as good as it could is a
con. Maybe we should tell on our web pages that someone who
wants to create indices on multi-K attributes should better
look for another DB, because Postgres is slow in that case?

I agree that this is only a problem if indexes are used in sorting, and may
not be a problem if one builds an index on 'substr(toastable-field,1,20)',
but I think you are suggesting not supporting functional indexes,
below...but maybe I've missed the point.

The current TOAST implementation hooks into the heap access
methods only. Automagically covering the index issues due to
the 2K approach. Fact is, that if more toast entries can get
produced during index inserts, we need to take care for them
during vacuum (the only place where index items get removed).
Alot of work just to support huge functional indices - IMHO
not worth the efford right now. Let's better get some
experience with the entire thing before going too far.

Yeah - you missed me here.

In the case of a functional index, the function would seldom
return one of the original tuples attribute values. Usually
those functions manipulate one or more attributes to compute
a completely new value (like your substr() example above).
In the TOAST world, any such function returns a plain, fully
expanded, in memory value.

So even if the toaster had worked on the main tuple and
compressed/moved off some attributes, the value that is
computed during index tuple creation is of full size. Having
a char(20000) attribute, the toaster will shrink it down so
the main tuple will fit. But a functional index like
"substr(att, 1, 10000)" must fail, because during index tuple
creation the funtion is evalueated and creates a 10000 byte
value.

In the current implementation, non-functional indices on huge
fields should be supported (there still are bugs because it
doesn't work right now). For functional ones, the old
restriction of "index-tuple must be smaller than supported
tuple size of index method" applies.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #