Extending varlena
Folks,
As the things stored in databases grow, we're going to start needing
to think about database objects that 4 bytes of size can't describe.
People are already storing video in lo and bytea fields. To date, the
sizes of media files have never trended downward.
What would need to happen for the next jump up from where varlena is
now, to 8 bytes? Would we want to use the bit-stuffing model that the
current varvarlena uses?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes:
What would need to happen for the next jump up from where varlena is
now, to 8 bytes?
Dealing with upwards-of-4GB blobs as single Datums isn't remotely sane,
and won't become so in the near (or even medium) future. So I don't
see the point of doing all the work that would be involved in making
this go.
What would make more sense is to redesign the large-object stuff to be
somewhat modern and featureful, and provide stream-access APIs (think
lo_read, lo_seek, etc) that allow offsets wider than 32 bits. The main
things I think we'd need to consider besides just the access API are
- permissions features (more than "none" anyway)
- better management of orphaned objects (obsoleting vacuumlo)
- support > 16TB of large objects (maybe partition pg_largeobject?)
- dump and restore probably need improvement to be practical for such
large data volumes
regards, tom lane
On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
What would need to happen for the next jump up from where varlena
is now, to 8 bytes?Dealing with upwards-of-4GB blobs as single Datums isn't remotely
sane, and won't become so in the near (or even medium) future. So I
don't see the point of doing all the work that would be involved in
making this go.
OK
What would make more sense is to redesign the large-object stuff to
be somewhat modern and featureful, and provide stream-access APIs
(think lo_read, lo_seek, etc) that allow offsets wider than 32 bits.
Great!
The main things I think we'd need to consider besides just the
access API are- permissions features (more than "none" anyway)
Would ROLEs work, or are you thinking of the per-row and per-column
access controls people sometimes want?
- better management of orphaned objects (obsoleting vacuumlo)
- support > 16TB of large objects (maybe partition pg_largeobject?)
- dump and restore probably need improvement to be practical for such
large data volumes
That, and the usual upgrade-in-place :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter wrote:
Folks,
As the things stored in databases grow, we're going to start needing
to think about database objects that 4 bytes of size can't describe.
People are already storing video in lo and bytea fields. To date, the
sizes of media files have never trended downward.
I always find these requests puzzling. Is it really useful to store the
data for a jpeg, video file or a 10GB tar ball in a database column?
Does anyone actually search for byte sequences within those data streams
(maybe if it were text)? I would think that the metadata is what gets
searched: title, track, name, file times, size, etc... Database storage
is normally pricey, stocked with 15K drives, so wasting that expensive
storage with non-searchable binary blobs doesn't make much sense. Why
not offload the data to a file system with 7200 RPM SATA drives and
store a reference to it in the db? Keep the db more compact and simpler
to manage.
Andrew Chernow
eSilo, LLC
On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
David Fetter wrote:
Folks,
As the things stored in databases grow, we're going to start
needing to think about database objects that 4 bytes of size can't
describe. People are already storing video in lo and bytea fields.
To date, the sizes of media files have never trended downward.I always find these requests puzzling. Is it really useful to store
the data for a jpeg, video file or a 10GB tar ball in a database
column?
It is if you need transaction semantics. Think medical records, etc.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter wrote:
On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
David Fetter wrote:
Folks,
As the things stored in databases grow, we're going to start
needing to think about database objects that 4 bytes of size can't
describe. People are already storing video in lo and bytea fields.
To date, the sizes of media files have never trended downward.I always find these requests puzzling. Is it really useful to store
the data for a jpeg, video file or a 10GB tar ball in a database
column?It is if you need transaction semantics. Think medical records, etc.
Cheers,
David.
I see that, although developing the middleware between db and fs is
rather trival. I think that is the puzzling part. It just feels akward
to "me" to just stuff it in the db. You can do more by distributing.
Anyways (back on topic), I am in favor of removing limits from any
section of the database ... not just your suggestion. The end-user
application should impose limits.
Andrew
Andrew,
I always find these requests puzzling. Is it really useful to store the
data for a jpeg, video file or a 10GB tar ball in a database column?
Some people find it useful. Because LOs are actually easier to manage in
PG than in most other DBMSes, right now that's a significant source of
PostgreSQL adoption. I'd like to encourage those users by giving them
more useful LO features.
--
--Josh
Josh Berkus
PostgreSQL
San Francisco
I always find these requests puzzling. Is it really useful to store the
data for a jpeg, video file or a 10GB tar ball in a database column?
One use case is that it can use the existing postgresql protocol, and
does not require extra filesystem mounts, extra error handling, and
other complexity.
Regards,
Jeff Davis
Andrew Chernow <ac@esilo.com> writes:
Anyways (back on topic), I am in favor of removing limits from any
section of the database ... not just your suggestion. The end-user
application should impose limits.
That's nice as an abstract principle, but there are only so many hours
in the day, so we need to prioritize which limits we're going to get rid
of. The 4-byte limit on individual Datum sizes does not strike me as a
limit that's going to be significant for practical use any time soon.
(I grant David's premise that people will soon want to work with objects
that are larger than that --- but not that they'll want to push them
around as indivisible, store-and-fetch-as-a-unit field values.)
regards, tom lane
Tom Lane wrote:
Andrew Chernow <ac@esilo.com> writes:
Anyways (back on topic), I am in favor of removing limits from any
section of the database ... not just your suggestion. The end-user
application should impose limits.That's nice as an abstract principle, but there are only so many hours
in the day, so we need to prioritize which limits we're going to get rid
of. The 4-byte limit on individual Datum sizes does not strike me as a
limit that's going to be significant for practical use any time soon.
(I grant David's premise that people will soon want to work with objects
that are larger than that --- but not that they'll want to push them
around as indivisible, store-and-fetch-as-a-unit field values.)regards, tom lane
Yeah, my comments were overly general. I wasn't suggesting attention be
put on one limit over another. I was only saying that the act of
removing a limit (of which many are arbitrary) is most often a good one.
andrew
Jeff Davis wrote:
I always find these requests puzzling. Is it really useful to store the
data for a jpeg, video file or a 10GB tar ball in a database column?One use case is that it can use the existing postgresql protocol,
So can what I am suggesting. How about a user-defined C function in the
backend that talks to the fs and uses SPI to sync info with a record?
Now the operation is behind a transaction. Yet, one must handle fs
orphans from evil crash cases.
Just one solution, but other more creative cats may have better ideas.
the point is, it can be done without too much effort. A little TLC :)
Andrew
David Fetter <david@fetter.org> writes:
On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
The main things I think we'd need to consider besides just the
access API are- permissions features (more than "none" anyway)
Would ROLEs work, or are you thinking of the per-row and per-column
access controls people sometimes want?
Well, obviously roles are the entities that receive permissions, but
on what do we base granting permissions to LOs?
With the current model that a LO is an independent entity that is merely
referenced (or not) by OIDs in the database, it seems like we'd have to
grant/revoke permissions to individual LOs, identified by OID; which
sure seems messy to me. People don't really want to name their LOs
by OID anyway --- it's just a convention that's forced on them by the
current implementation.
I was kinda wondering about something closer to the TOAST model, where
a blob is only referenceable from a value that's in a table field;
and that value encapsulates the "name" of the blob in some way that
needn't even be user-visible. This'd greatly simplify the
cleanup-dead-objects problem, and we could avoid addressing the
permissions problem at all, since regular SQL permissions on the table
would serve fine. But it's not clear what regular SQL fetch and update
behaviors should be like for such a thing. (Fetching or storing the
whole blob value is right out, IMHO.) ISTR hearing of concepts roughly
like this in other DBs --- does it ring a bell for anyone?
regards, tom lane
David Fetter <david@fetter.org> writes:
On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
I always find these requests puzzling. Is it really useful to store
the data for a jpeg, video file or a 10GB tar ball in a database
column?
It is if you need transaction semantics. Think medical records, etc.
The basic problem with outside-the-DB data storage is keeping it in sync
with your inside-the-DB metadata. In a slowly changing dataset you can
probably get away with external storage, but if there's lots of updates
then allowing the DB to manage the storage definitely makes life easier.
This is not to say that you want SQL-style operations on the blobs;
in fact I think you probably don't, which is why I was pointing to
a LO-style API.
regards, tom lane
On Mon, Aug 18, 2008 at 07:31:04PM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
The main things I think we'd need to consider besides just the
access API are- permissions features (more than "none" anyway)
Would ROLEs work, or are you thinking of the per-row and per-column
access controls people sometimes want?Well, obviously roles are the entities that receive permissions, but
on what do we base granting permissions to LOs?With the current model that a LO is an independent entity that is merely
referenced (or not) by OIDs in the database, it seems like we'd have to
grant/revoke permissions to individual LOs, identified by OID; which
sure seems messy to me. People don't really want to name their LOs
by OID anyway --- it's just a convention that's forced on them by the
current implementation.I was kinda wondering about something closer to the TOAST model, where
a blob is only referenceable from a value that's in a table field;
and that value encapsulates the "name" of the blob in some way that
needn't even be user-visible.
This vaguely reminds me of Sybase's hidden primary keys.
This'd greatly simplify the
cleanup-dead-objects problem, and we could avoid addressing the
permissions problem at all, since regular SQL permissions on the table
would serve fine. But it's not clear what regular SQL fetch and update
behaviors should be like for such a thing. (Fetching or storing the
whole blob value is right out, IMHO.) ISTR hearing of concepts roughly
like this in other DBs --- does it ring a bell for anyone?
Informix has some pretty good blob-handling:
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter wrote"
<...>
This'd greatly simplify the
cleanup-dead-objects problem, and we could avoid addressing the
permissions problem at all, since regular SQL permissions on the table
would serve fine. But it's not clear what regular SQL fetch and update
behaviors should be like for such a thing. (Fetching or storing the
whole blob value is right out, IMHO.) ISTR hearing of concepts roughly
like this in other DBs --- does it ring a bell for anyone?Informix has some pretty good blob-handling:
Agreed. I used Informix a few years back in a system that scanned both sides of multi-page financial documents; we stored them in Informix' blobs, which IIRC could be tuned to be given number of bytes. We found that 90% of our images fit in a given size and since Informix raw disk access let them move up the whole blob in a single pass, it was quite fast, and gave us all the warmth and fuzziness of ACID functionality. But we didn't fetch parts of the BLOB -- metadata lived in its own table. There is/was an Illustra/Informix blade which let you in theory do some processing of images (indexing) but that seems like a very specialized case.
Greg Williamson
Senior DBA
DigitalGlobe
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
On Mon, 2008-08-18 at 16:22 -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
What would need to happen for the next jump up from where varlena is
now, to 8 bytes?Dealing with upwards-of-4GB blobs as single Datums isn't remotely sane,
and won't become so in the near (or even medium) future. So I don't
see the point of doing all the work that would be involved in making
this go.What would make more sense is to redesign the large-object stuff to be
somewhat modern and featureful, and provide stream-access APIs (think
lo_read, lo_seek, etc) that allow offsets wider than 32 bits. The main
things I think we'd need to consider besides just the access API are- permissions features (more than "none" anyway)
- better management of orphaned objects (obsoleting vacuumlo)
- support > 16TB of large objects (maybe partition pg_largeobject?)
- dump and restore probably need improvement to be practical for such
large data volumes
Sounds like a good list.
Probably also using a separate Sequence to allocate numbers rather than
using up all the Oids on LOs would be a good plan.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
Probably also using a separate Sequence to allocate numbers rather than
using up all the Oids on LOs would be a good plan.
Well, assuming that your Large Objects are actually Large, you aren't
going to need as many OIDs as all that ;-)
However: I was chewing on this point a few days ago, and it seemed to me
that essentially duplicating the functionality of the OID generator
wasn't likely to be a win. What seems more practical is to extend the
internal next-OID counter to 64 bits, and allow callers to get either
the full 64 bits or just the lowest 32 bits depending on what they need.
This change would actually be entirely transparent to 32-bit callers,
and the extra cycles to manage a 64-bit counter would surely be lost in
the noise compared to acquiring/releasing OidGenLock.
regards, tom lane
On Mon, 2008-08-18 at 23:43 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Probably also using a separate Sequence to allocate numbers rather than
using up all the Oids on LOs would be a good plan.Well, assuming that your Large Objects are actually Large, you aren't
going to need as many OIDs as all that ;-)However: I was chewing on this point a few days ago, and it seemed to me
that essentially duplicating the functionality of the OID generator
wasn't likely to be a win. What seems more practical is to extend the
internal next-OID counter to 64 bits, and allow callers to get either
the full 64 bits or just the lowest 32 bits depending on what they need.
This change would actually be entirely transparent to 32-bit callers,
and the extra cycles to manage a 64-bit counter would surely be lost in
the noise compared to acquiring/releasing OidGenLock.
Sounds very cool.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Mon, 18 Aug 2008, Tom Lane wrote:
What would make more sense is to redesign the large-object stuff to be
somewhat modern and featureful, and provide stream-access APIs (think
lo_read, lo_seek, etc) that allow offsets wider than 32 bits.
A few years ago, I was working on such a project for a company I used to
work for. The company changed directions shortly thereafter, and the
project was dropped, but perhaps the patch might still be useful as a
starting point for someone else.
The original patch is
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php, and the
advice I was working on implementing was in
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01063.php
I am attaching the latest version of the patch I found around. As it was
almost 3 years ago, I am a little fuzzy on where I left off, but I do
remember that I was trying to work through the suggestions Tom Lane gave
in that second linked email. I would recommend discarding the libpq
changes, since that seemed to not pass muster.
Note that this patch was against 8.0.3. There only seem to be a few
issues applying it to the current head, but I haven't really dug into them
to see how difficult it would be to update. Luckily, the large object
code is fairly slow-moving, so there aren't too many conflicts. One thing
I did notice is that it looks like someone extracted one of the functions
I wrote in this patch and applied it as a 32-bit version. Good for them.
I'm glad someone got some use out of this project, and perhaps more use
will come of it.
--
At the source of every error which is blamed on the computer you will
find at least two human errors, including the error of blaming it on
the computer.
Attachments:
postgres-8.0.3-64bit-lo-r2.patchtext/plain; charset=US-ASCII; name=postgres-8.0.3-64bit-lo-r2.patchDownload+888-32
Josh Berkus wrote:
Andrew,
I always find these requests puzzling. Is it really useful to store the
data for a jpeg, video file or a 10GB tar ball in a database column?Some people find it useful. Because LOs are actually easier to manage in
PG than in most other DBMSes, right now that's a significant source of
PostgreSQL adoption. I'd like to encourage those users by giving them
more useful LO features.
Given that they don't replicate with most (any?) of the currently
available replication solutions, the fact that they are easy to use
becomes irrelevant fairly quickly to larger installations in my experience.
But the interface *is* nice, so if we could fix that, it would be very good.
//Magnus