Re: bytea

Started by Bruce Momjianover 25 years ago1 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

This brings up some good issues for the 7.2 release. Will large objects
become just an API on top of toast, or should they remain as a separate
physical storage format?

At 08:30 PM 3/15/00 -0500, Bruce Momjian wrote:

Yes, we should keep it. I see now it is for purely binary data, while
text is for null-terminated strings.

donb=# create table foo (b bytea);
CREATE
donb=# insert into foo values('ab\0cd');
INSERT 107497 1
donb=# select * from foo;
b
----
ab
(1 row)

donb=#

Thus my comment "maybe they should be made to work" :)

I don't know what's actually inside attr b, but the "cd" is at least
dropped on output.

For the BLOB hack I did for our toolkit I did the equivalent of
uuencoding the input, which costs a predictable 4/3 expansion of
the binary data (this is a segmented type, all done outside PG
via SQL, triggers, and AOLserver driver magic but lets us stuff
binary data such as photos etc, and pg_dump/restore them).

If TOAST weren't on the way, I'd sit down and do a proper BLOB,
as I explained to the folks on our web toolkit team lo is
tantilizingly close to being useful for folks like us, without
actually being useful.

BLOBs should sit atop TOAST, though, and perhaps specialized I/O
routines for a BLOB type could be made. Those for bytea could
be changed, too, at risk of breaking existing code? But since
bytea really acts like text perhaps there is no real existing code
that exists that couldn't just operate on text instead, so there
could be freedom to change it?

For real binary data, uuencoded strings are a better choice for
a printable output form that the text+\nnn form (since a high
proportion of bytes will be emitted in the lengthy \nnn form).

But normally with BLOB one would like a way to just stuff a file
or data in a buffer into it, etc, much like current lo. The printable
dump of data is mostly useful for pg_dump, IMO - a binary backup would
remove the need for such a hack, too.

Standard BLOBs provide a way to stuff segments into the db...

BLOBs, as done by TOAST or my current segmented table hack used in
our toolkit, only require a single table (or a single table per
underlying user table in the case of TOAST) so don't clutter the
way lo does.

But lo allows each binary object to be 2GB in length.

So they kind of fit different needs. lo seems fine for those who
need really huge objects, and probably not a bazillion (since each
generates a file + index). My hack, or TOAST which will be similar
in table usage (both being segmented types in common tables), is
good for binary data of moderately large size not to exceed 2GB
in aggregate.

Of course, with 64-bit systems on the horizon, the 2GB aggregate
limit will slowly begin to disappear, too. 'Til then, providing
a "real BLOB" while retaining lo for those who need single REALLY
huge data objects would seem best.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026