TOAST vs arrays

Started by Tom Laneover 25 years ago3 messages

tgl@sss.pgh.pa.us

over 25 years ago

If I understand the fundamental design of TOAST correctly, it's not
allowed to have multiple heap tuples containing pointers to the same
moved-off TOAST item. For example, if one tuple has such a pointer,
and we copy it with INSERT ... SELECT, then the new tuple has to be
constructed with its own copy of the moved-off item. Without this
you'd need reference counts and so forth for moved-off values.

It looks like you have logic for all this in tuptoaster.c, but
I see a flaw: the code doesn't look inside array fields to see if
any of the array elements are pre-toasted values. There could be
a moved-off-item pointer inside an array, copied from some other
place.

Note the fact that arrays aren't yet considered toastable is
no defense. An array of a toastable data type is sufficient
to create the risk.

What do you want to do about this? We could have heap_tuple_toast_attrs
scan through all the elements of arrays of toastable types, but that
strikes me as slow. I'm thinking the best approach is for the array
construction routines to refuse to insert toasted values into array
objects in the first place --- instead, expand them before insertion.
Then the whole array could be treated as a toastable object, but there
are no references inside the array to worry about.

If we do that, should compressed-in-place array items be expanded back
to full size before insertion in the array? If we don't, we'd likely
end up trying to compress already-compressed data, which is a waste of
effort ... but OTOH it seems a shame to force the data back to full
size unnecessarily. Either way would work, I'm just not sure which
is likely to be more efficient.

regards, tom lane

Noname

JanWieck@t-online.de

over 25 years ago

In reply to: Tom Lane (#1)

Re: TOAST vs arrays

Tom Lane wrote:

If I understand the fundamental design of TOAST correctly, it's not
allowed to have multiple heap tuples containing pointers to the same
moved-off TOAST item. For example, if one tuple has such a pointer,
and we copy it with INSERT ... SELECT, then the new tuple has to be
constructed with its own copy of the moved-off item. Without this
you'd need reference counts and so forth for moved-off values.

It looks like you have logic for all this in tuptoaster.c, but
I see a flaw: the code doesn't look inside array fields to see if
any of the array elements are pre-toasted values. There could be
a moved-off-item pointer inside an array, copied from some other
place.

Note the fact that arrays aren't yet considered toastable is
no defense. An array of a toastable data type is sufficient
to create the risk.

Yepp

What do you want to do about this? We could have heap_tuple_toast_attrs
scan through all the elements of arrays of toastable types, but that
strikes me as slow. I'm thinking the best approach is for the array
construction routines to refuse to insert toasted values into array
objects in the first place --- instead, expand them before insertion.
Then the whole array could be treated as a toastable object, but there
are no references inside the array to worry about.

I think the array construction routines is the right place to
expand them.

If we do that, should compressed-in-place array items be expanded back
to full size before insertion in the array? If we don't, we'd likely
end up trying to compress already-compressed data, which is a waste of
effort ... but OTOH it seems a shame to force the data back to full
size unnecessarily. Either way would work, I'm just not sure which
is likely to be more efficient.

I think it's not too bad to expand them and then let the
toaster (eventually) compress the entire array again. Larger
data usually yields better compression results. Given the
actual speed of our compression code, I don't expect a
performance penalty from it.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Noname (#2)

Re: TOAST vs arrays

JanWieck@t-online.de (Jan Wieck) writes:

What do you want to do about this? We could have heap_tuple_toast_attrs
scan through all the elements of arrays of toastable types, but that
strikes me as slow. I'm thinking the best approach is for the array
construction routines to refuse to insert toasted values into array
objects in the first place --- instead, expand them before insertion.
Then the whole array could be treated as a toastable object, but there
are no references inside the array to worry about.

I think the array construction routines is the right place to
expand them.

Sounds like a plan.

Just in case anyone wants to object: I'm planning to rip out all of
the "large object array" and "chunked array" support that's in there
now. AFAICS it does nothing that won't be done as well or better by
toasted arrays, and it probably doesn't work anyway (seeing that much
of it has been ifdef'd out for a long time).

regards, tom lane