My "TOAST slicing" patch -explanation
After reading the summary of Peter's comments on the patches in the
list, I thought I would provide a little more explanation, as he
suggests.
At present, functions that take text arguments retrieve them via the
PG_GETARG macros. This reconstructs the whole value into memory
(potentially time-consuming for a large value). This may be unnecessary
for some applications, where only a specified part of the value is
needed e.g. the header data from an image.
Because a TOASTed value is stored in equal-size chunks, it is
straightforward (for an uncompressed value) to determine which chunk(s)
contain a given substring. My patch adds accessor methods similar to the
existing macros which fetch the minimal number of chunks to satisfy a
particular substring request. (In the case of a compressed value,
"minimal number" == "all of them"). In other words, instead of writing
PG_GETARG_TEXT_P(0) and receiving back a large allocated value, you can
say PG_GETARG_TEXT_P_SLICE(0,0,500) to receive only the first 500 bytes.
This has a minimal performance impact -an indexscan is already used to
retrieve TOAST chunks -the extra accessor routines just specify the
second key as well as the first.
I have rewritten text_substr and bytea_substr to use these methods.
I have also added an ALTER TABLE x ALTER COLUMN y SET STORAGE command to
provide a simple way to change attstorage for a column. (The macros
above will be most efficient with uncompressed values.)
Finally, I am aware of the following items which are not covered by the
patch:
1) Efficient updating of parts of a value. This is not trivial[1]And I don't know how it might be used. Maybe if anyone has any ideas about substring assignment (someone did point me in the direction of an OVERLAY() function) or its applications I can think about that..
2) Should the large object interface be handled via TOAST?[2]AIUI, the underlying mechanism is similar, but implemented separately.
As for the applications, well, it foes allow you to stream MP3s out of
the database reasonably efficiently :-)
Any questions, please ask...
John
[1]: And I don't know how it might be used. Maybe if anyone has any ideas about substring assignment (someone did point me in the direction of an OVERLAY() function) or its applications I can think about that.
about substring assignment (someone did point me in the direction of an
OVERLAY() function) or its applications I can think about that.
[2]: AIUI, the underlying mechanism is similar, but implemented separately.
separately.
John Gray <jgray@azuli.co.uk> writes:
Finally, I am aware of the following items which are not covered by the
patch:
1) Efficient updating of parts of a value. This is not trivial[1].
Actually, based on subsequent discussion I now understand that efficient
updating of parts of a TOASTed value is impossible, if by that you mean
rewriting only the modified part. This is so because TOAST does not
use MVCC, really: it relies on MVCC for the owning tuple to determine
visibility of a tuple value. The only safe way to update a TOAST item
is to rewrite the whole thing with a new TOAST id number and then
update the owning tuple to reference that new id.
Despite this, it'd be a really good idea to offer functions that allow
applications to write part of a large TOASTed value. Even if it can't
be as efficient as we'd like, we could still eliminate pushing the rest
of the value back and forth to the client.
2) Should the large object interface be handled via TOAST?[2]
Probably not, given the above facts. We do have MVCC behavior for
partial updates of large objects, and we shouldn't lose it.
Having said all that, I think John's patch for substring extraction is
fine in concept. I haven't looked at it in detail, but I think we
should review it and expect to apply it (possibly with some cleanups).
regards, tom lane
On Sun, 2002-02-24 at 23:52, Tom Lane wrote:
John Gray <jgray@azuli.co.uk> writes:
Finally, I am aware of the following items which are not covered by the
patch:1) Efficient updating of parts of a value. This is not trivial[1].
Actually, based on subsequent discussion I now understand that efficient
updating of parts of a TOASTed value is impossible, if by that you mean
rewriting only the modified part. This is so because TOAST does not
use MVCC, really: it relies on MVCC for the owning tuple to determine
visibility of a tuple value.
Do TOAST tables participate in WAL ?
The only safe way to update a TOAST item
is to rewrite the whole thing with a new TOAST id number and then
update the owning tuple to reference that new id.
Can't we still devise some way to reuse the chunks that did not change ?
With some kind of double indirection and associated bookkeeping perhaps?
Despite this, it'd be a really good idea to offer functions that allow
applications to write part of a large TOASTed value. Even if it can't
be as efficient as we'd like, we could still eliminate pushing the rest
of the value back and forth to the client.
I guess this can be already done with creative use of substring() and ||
2) Should the large object interface be handled via TOAST?[2]
Probably not, given the above facts. We do have MVCC behavior for
partial updates of large objects, and we shouldn't lose it.
It would feel "cleaner" to have one representation for LOs - can't TOAST
just be made to participate in MVCC?
We could restict WAL of LOs to UPDATES only (and force fsync on TOAST
FILE after INSERT) just to conserve log space.
----------------
Hannu