BLOB support

Started by Radosław Smoguraalmost 15 years ago31 messageshackers

rsmogura@softperience.eu

almost 15 years ago

Hello,

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

I have implemented:
- 0.
- 1. and 2. without altering relations and types.
- 3. Inserts only, Updates need polishing. No recursion (no support for BLOBs
in composites or arrays).

Current patch is here (many changes to pg_types.h - new column haslobs added),
it's in _early_ stage for those who wants to look inside it.

Any ideas or suggestions?

Regards,
Radek

P. S.
I'm during removal, and I have limited access to Internet.

Peter Eisentraut

peter_e@gmx.net

almost 15 years ago

In reply to: Radosław Smogura (#1)

Re: BLOB support

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

Pavel Stehule

pavel.stehule@gmail.com

almost 15 years ago

In reply to: Peter Eisentraut (#2)

Re: BLOB support

2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

a streaming for bytea could be nice. A very large bytea are limited by
query size - processing long query needs too RAM,

Pavel

Show quoted text

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kenneth Marshall

ktm@rice.edu

almost 15 years ago

In reply to: Pavel Stehule (#3)

Re: BLOB support

On Thu, Jun 02, 2011 at 02:58:52PM +0200, Pavel Stehule wrote:

2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

a streaming for bytea could be nice. A very large bytea are limited by
query size - processing long query needs too RAM,

Pavel

+1 for a streaming interface to bytea/text. I do agree that there is no need
to reinvent the TOAST architecture with another name, just improve the existing
implementation.

Regards,
Ken

Pavlo Golub

pavlo.golub@cybertec.at

almost 15 years ago

In reply to: Pavel Stehule (#3)

Re: BLOB support

Hello, Pavel.

You wrote:

PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

PS> a streaming for bytea could be nice. A very large bytea are limited by
PS> query size - processing long query needs too RAM,

LO (oid) solves this, doesn't it?

PS> Pavel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
With best wishes,
Pavel mailto:pavel@gf.microolap.com

Radosław Smogura

rsmogura@softperience.eu

almost 15 years ago

In reply to: Peter Eisentraut (#2)

Re: BLOB support

On Thu, 02 Jun 2011 15:39:39 +0300, Peter Eisentraut wrote:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for
this was
to give ability to create (B/C)LOB columns and add casting
functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea,
serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set
recursivly.
2. Relation has same bool haslobs (used to speed up tables without
LOBs)
3. When data are inserted/updated then "special" function is called
and tuple
is modified in this way all LOBs are serialized to (old) LOB table
and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST.

May look similar, but Datums doesn't support more then 32bit length and
size of any field is limited to 1GB, am I right? Serializations is only
for casting simple values < 1GB, and simple operations, to do not
overhead creation of hundreds LOBs.

What functionality exactly do you envision that the BLOB and CLOB
types would
need to have that would warrant treating them different from, say,
bytea
and text?

Actually I thought about less sophisticated support of LOBs, supporting
casting and copying data, as well known form other databases idea that
LOBs are not "downloaded" during normal query execution (just ids are
taken). Currently, e.g. LOBs are not connected with tables, so deleting
rows doesn't delete LOB, table actually holds Oid of large objects, no
support for casting to/from LOB, no support for CLOBS. Some drivers try
to emulate BLOBs/CLOBs, but it is not perfect, mainly from above
reasons.

Regards,
Radek

Pavel Stehule

pavel.stehule@gmail.com

almost 15 years ago

In reply to: Pavlo Golub (#5)

Re: BLOB support

2011/6/2 Pavel Golub <pavel@microolap.com>:

Hello, Pavel.

You wrote:

PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

PS> a streaming for bytea could be nice. A very large bytea are limited by
PS> query size - processing long query needs too RAM,

LO (oid) solves this, doesn't it?

partially

There is a few disadvantages LO against bytea, so there are requests
for "smarter" API for bytea.

Significant problem is different implementation of LO for people who
have to port application to PostgreSQL from Oracle, DB2. There are
some JDBC issues too.

For me - main disadvantage of LO in one space for all. Bytea removes
this disadvantage, but it is slower for lengths > 20 MB. It could be
really very practical have a possibility insert some large fields in
second NON SQL stream. Same situation is when large bytea is read.

Pavel

Show quoted text

PS> Pavel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
With best wishes,
Pavel mailto:pavel@gf.microolap.com

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Radosław Smogura (#6)

Re: BLOB support

On Thu, Jun 2, 2011 at 9:29 AM, Radosław Smogura
<rsmogura@softperience.eu> wrote:

What functionality exactly do you envision that the BLOB and CLOB types
would
need to have that would warrant treating them different from, say, bytea
and text?

Actually I thought about less sophisticated support of LOBs, supporting
casting and copying data, as well known form other databases idea that LOBs
are not "downloaded" during normal query execution (just ids are taken).
Currently, e.g. LOBs are not connected with tables, so deleting rows doesn't
delete LOB, table actually holds Oid of large objects, no support for
casting to/from LOB, no support for CLOBS. Some drivers try to emulate
BLOBs/CLOBs, but it is not perfect, mainly from above reasons.

But these problems can be fixed without inventing a completely new
system, I think. Or at least we should try. I can see the point of a
data type that is really a pointer to a LOB, and the LOB gets deleted
when the pointer is removed, but I don't think that should require
far-reaching changes all over the system (like relhaslobs) to make it
work efficiently. I think you need to start with a problem statement,
get agreement that it is a problem and on what the solution should be,
and then go write the code to implement that solution. This is a
classic example of writing the code first and then working backwards
toward the problem you're trying to solve, and that rarely works out
well for the reasons that you're now finding out: people may not agree
with your proposed solution, they may want things done differently,
and now you're stuck reworking code that you've already written. It's
much easier to change a design document than it is to rewrite code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#8)

Re: BLOB support

Robert Haas <robertmhaas@gmail.com> writes:

But these problems can be fixed without inventing a completely new
system, I think. Or at least we should try. I can see the point of a
data type that is really a pointer to a LOB, and the LOB gets deleted
when the pointer is removed, but I don't think that should require
far-reaching changes all over the system (like relhaslobs) to make it
work efficiently. I think you need to start with a problem statement,
get agreement that it is a problem and on what the solution should be,
and then go write the code to implement that solution.

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

For primitive types like text or bytea it seems pretty obvious what
"streaming access" should entail, but it might be interesting to
consider what it should mean for structured types. For instance, if I
have an array field with umpteen zillion elements, it might be nice to
fetch them one at a time using the streaming access mechanism. I don't
say that that has to be in the first version, but it'd be a good idea to
keep that in the back of your head so you don't design a dead-end
solution that can't be extended in that direction.

regards, tom lane

#10

Pavel Stehule

pavel.stehule@gmail.com

almost 15 years ago

In reply to: Tom Lane (#9)

Re: BLOB support

2011/6/2 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

But these problems can be fixed without inventing a completely new
system, I think. Or at least we should try. I can see the point of a
data type that is really a pointer to a LOB, and the LOB gets deleted
when the pointer is removed, but I don't think that should require
far-reaching changes all over the system (like relhaslobs) to make it
work efficiently. I think you need to start with a problem statement,
get agreement that it is a problem and on what the solution should be,
and then go write the code to implement that solution.

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

For primitive types like text or bytea it seems pretty obvious what
"streaming access" should entail, but it might be interesting to
consider what it should mean for structured types. For instance, if I
have an array field with umpteen zillion elements, it might be nice to
fetch them one at a time using the streaming access mechanism. I don't
say that that has to be in the first version, but it'd be a good idea to
keep that in the back of your head so you don't design a dead-end
solution that can't be extended in that direction.

Pavel

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Radosław Smogura

rsmogura@softperience.eu

almost 15 years ago

In reply to: Tom Lane (#9)

Re: BLOB support

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42

Robert Haas <robertmhaas@gmail.com> writes:

But these problems can be fixed without inventing a completely new
system, I think. Or at least we should try. I can see the point of a
data type that is really a pointer to a LOB, and the LOB gets deleted
when the pointer is removed, but I don't think that should require
far-reaching changes all over the system (like relhaslobs) to make it
work efficiently. I think you need to start with a problem statement,
get agreement that it is a problem and on what the solution should be,
and then go write the code to implement that solution.

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

For primitive types like text or bytea it seems pretty obvious what
"streaming access" should entail, but it might be interesting to
consider what it should mean for structured types. For instance, if I
have an array field with umpteen zillion elements, it might be nice to
fetch them one at a time using the streaming access mechanism. I don't
say that that has to be in the first version, but it'd be a good idea to
keep that in the back of your head so you don't design a dead-end
solution that can't be extended in that direction.

regards, tom lane

In context of LOBs streaming is resolved... I use current LO functionallity
(so driver may be able to read LOBs as psql \lo_export does it or using COPY
subprotocol) and client should get just LO's id. BLOBs in this implementation,
like Robert wanted are just wrapper for core LO, with some extensions for
special situations.... Adding of relhaslob in this impl is quite importnat to
do not examine tupledesc for each table operation, but this value may be
deduced during relation open (with performance penatly). I saw simillar is
made few lines above when triggers are fired, and few lines below when indices
are updated.

Currently BLOBs may be emulated using core LO (JDBC driver does it), but among
everything else, other problems are, if you look from point of view of
application developing:

1. No tracking of unused LO (you store just id of such object). You may leak
LO after row remove/update. User may write triggers for this, but it is not
argument - BLOB type is popular, and it's simplicity of use is quite
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

Other solution, I was think about, is to introduce system triggers (such
triggers can't be disabled or removed). So there will be new flag in triggers
table.

Now I think, we should try to mix both aproches, as system triggers may give
interesting API for other developers.

Other databases (may) store LOBs, Arrays, and Composites in external tables,
so user get's just id of such object.

I think about two weaks about streaming, I have some concepts about this, but
from point of view of memory consumption and performance. I will send concept
later, I want to think a little bit about it once more, and search what can be
actually done.

Regards,
Radek

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Radosław Smogura (#11)

Re: BLOB support

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura@softperience.eu> writes:

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

In context of LOBs streaming is resolved... I use current LO functionallity
(so driver may be able to read LOBs as psql \lo_export does it or using COPY
subprotocol) and client should get just LO's id.

Just to be clear: I do not want to expose a concept of object IDs for
field values in the first place. All of the problems you enumerate stem
from the idea that LOBs ought to be a distinct kind of field, and I
don't buy that.

regards, tom lane

#13

Radosław Smogura

rsmogura@softperience.eu

almost 15 years ago

In reply to: Tom Lane (#12)

Re: BLOB support

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 19:43:16

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura@softperience.eu> writes:

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

In context of LOBs streaming is resolved... I use current LO
functionallity (so driver may be able to read LOBs as psql \lo_export
does it or using COPY subprotocol) and client should get just LO's id.

Just to be clear: I do not want to expose a concept of object IDs for
field values in the first place. All of the problems you enumerate stem
from the idea that LOBs ought to be a distinct kind of field, and I
don't buy that.

regards, tom lane

So do I understand good should We think about create bettered TOAST to support
larger values then 30-bit length? I like this much more, but without Objects
ID quering relation with lobs will require to lock relation for some time, as
client will need to reference LOB in some way, I think using TID or some
derivative of TID, am I right?

Regards,
Radek

#14

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 15 years ago

In reply to: Pavel Stehule (#7)

Re: BLOB support

Dne 2.6.2011 15:49, Pavel Stehule napsal(a):

2011/6/2 Pavel Golub <pavel@microolap.com>:

Hello, Pavel.

You wrote:

PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:

I partialy implemented following missing LOBs types. Requirement for this was
to give ability to create (B/C)LOB columns and add casting functionality e.g.
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple
is modified in this way all LOBs are serialized to (old) LOB table and just
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

PS> a streaming for bytea could be nice. A very large bytea are limited by
PS> query size - processing long query needs too RAM,

LO (oid) solves this, doesn't it?

partially

There is a few disadvantages LO against bytea, so there are requests
for "smarter" API for bytea.

Significant problem is different implementation of LO for people who
have to port application to PostgreSQL from Oracle, DB2. There are
some JDBC issues too.

For me - main disadvantage of LO in one space for all. Bytea removes
this disadvantage, but it is slower for lengths > 20 MB. It could be
really very practical have a possibility insert some large fields in
second NON SQL stream. Same situation is when large bytea is read.

Yes, being able to do this (without the need to use LOs as they have
their own set of problems - no FKs, etc.) would help a lot of people who
want/need to keep memory usage low.

What I'd like to see is the ability to stream BYTEA columns in both
directions - let's not reinvent the API, other databases already support
this. E.g. with Oracle you can do this using PDO (PHP):

<?php

$db = new PDO('oci:', 'scott', 'tiger');
$stmt = $db->prepare("insert into images (imagedata)".
"VALUES (EMPTY_BLOB())");

$fp = fopen('./myfile.data', 'rb');

$stmt->bindParam(1, $fp, PDO::PARAM_LOB);
$stmt->execute();

If we could make it work in a similar way, that would be great. There
are some more details at http://cz2.php.net/manual/en/pdo.lobs.php.

Tomas

#15

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 15 years ago

In reply to: Kenneth Marshall (#4)

Re: BLOB support

Dne 2.6.2011 15:18, ktm@rice.edu napsal(a):

On Thu, Jun 02, 2011 at 02:58:52PM +0200, Pavel Stehule wrote:

2011/6/2 Peter Eisentraut <peter_e@gmx.net>:

Superficially, this looks like a reimplementation of TOAST. What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

a streaming for bytea could be nice. A very large bytea are limited by
query size - processing long query needs too RAM,

Pavel

+1 for a streaming interface to bytea/text. I do agree that there is no need
to reinvent the TOAST architecture with another name, just improve the existing
implementation.

Building a "parallel" architecture that mimics TOAST is obviously a bad
idea.

But I do have a curious question - the current LO approach is based on
splitting the data into small chunks (2kB) and storing those chunks in a
bytea column of the pg_largeobject table.

How much overhead does all this mean? What if there is a special kind of
blocks for binary data, that limits the amount of chunks and TOAST?
Actually this probably would not need a special type of block, but when
writing a block there would be a single row with as much data as
possible (and some metadata). I.e. there would be almost 8kB of
compressed data.

This would probably bring some restrictions (e.g. inability to update
the data, but I don't think that's possible with the current LO anyway.
Has anyone thought about this?

regards
Tomas

#16

Ross J. Reedstrom

reedstrm@rice.edu

almost 15 years ago

In reply to: Tom Lane (#12)

Re: BLOB support

On Thu, Jun 02, 2011 at 01:43:16PM -0400, Tom Lane wrote:

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura@softperience.eu> writes:

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42

Yes. I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once". I see no good reason to import the entire
messy notion of LOBS/CLOBS. (The fact that other databases have done it
is not a good reason.)

In context of LOBs streaming is resolved... I use current LO functionallity
(so driver may be able to read LOBs as psql \lo_export does it or using COPY
subprotocol) and client should get just LO's id.

Just to be clear: I do not want to expose a concept of object IDs for
field values in the first place. All of the problems you enumerate stem
from the idea that LOBs ought to be a distinct kind of field, and I
don't buy that.

I think you're saying no OIDs exposed to the SQL i.e. actually stored in
a field and returned by a SELECT? (Which seems to be the proposal).

As I mentioned recently on another list, I've wrapped a block-oriented
"streaming" interface over bytea in python for a web app, specifically
to deal with the latency and memory footprint issues of storing
'largish' files directly in the db. I find that with a 64K blocksize,
latency is 'good enough' and substr() seems to be constant time for a
given size, no matter what part of the bytea value I'm fetching: toast
does a fine job of random access.

I was musing about providing a way to use the existing client lo
streaming interface (rather than the backend bits) for this type of
access. The thing called an OID in the client interface is really just a
nonce to tell the backend what data to send. With a single generator
function:

SELECT CASE WHEN is_lo THEN my_loid ELSE make_lo_oid(my_bytea) END
FROM my_file_table WHERE id = 34534;

Then plugging that back into the lo interface from the client side,
would let me use bytea as I currently do for files under 1GB, lo for
larger, and gain client side streaming that is transparent to the
storage of that particular value. Admittedly, application software would
still need to know how to _store_ different values, and manage large
objects, with all the pain that entails. But there's some gain in
unifying the reading part.

Hard to not call it an oid, since that's what the client libraries
already document it as (at least, python does)

Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

#17

Alvaro Herrera

alvherre@2ndquadrant.com

almost 15 years ago

In reply to: Radosław Smogura (#13)

Re: BLOB support

Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 2011:

So do I understand good should We think about create bettered TOAST to support
larger values then 30-bit length? I like this much more,

Good :-)

(BTW while it'd be good to have longer-than-30 bit length words for
varlena, I'm not sure we have room for that.)

but without Objects ID quering relation with lobs will require to lock
relation for some time,

Why? The tuples are not going away due to MVCC anyway.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Alvaro Herrera (#17)

Re: BLOB support

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 2011:

So do I understand good should We think about create bettered TOAST to support
larger values then 30-bit length? I like this much more,

Good :-)

(BTW while it'd be good to have longer-than-30 bit length words for
varlena, I'm not sure we have room for that.)

You wouldn't want to push such values around as whole values anyway.
Possibly what would work here is a variant form of TOAST pointer for
which we'd simply throw error if you tried to fetch the entire value
at once.

regards, tom lane

#19

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Radosław Smogura (#11)

Re: BLOB support

On Thu, Jun 2, 2011 at 12:53 PM, Radosław Smogura
<rsmogura@softperience.eu> wrote:

1. No tracking of unused LO (you store just id of such object). You may leak
LO after row remove/update. User may write triggers for this, but it is not
argument - BLOB type is popular, and it's simplicity of use is quite
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

As a general point, it would probably be a good idea to address each
of these issues separately, and to have a separate discussion about
each one.

As to #1 specifically, if you use a text or bytea field rather than a
large object per se, then this issue goes away. But then you lose the
streaming functionality. So at least some people here are saying that
we should try to fix that by adding the streaming functionality to
text/bytea rather than by doing anything to the large object facility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#19)

Re: BLOB support

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 2, 2011 at 12:53 PM, Radosław Smogura
<rsmogura@softperience.eu> wrote:

1. No tracking of unused LO (you store just id of such object). You may leak
LO after row remove/update. User may write triggers for this, but it is not
argument - BLOB type is popular, and it's simplicity of use is quite
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

As a general point, it would probably be a good idea to address each
of these issues separately, and to have a separate discussion about
each one.

As to #1 specifically, if you use a text or bytea field rather than a
large object per se, then this issue goes away. But then you lose the
streaming functionality. So at least some people here are saying that
we should try to fix that by adding the streaming functionality to
text/bytea rather than by doing anything to the large object facility.

#2 is also a problem that only becomes a problem if you insist that LOBs
have to be a distinct kind of value.

regards, tom lane

#21

Radosław Smogura

rsmogura@softperience.eu

almost 15 years ago

In reply to: Tom Lane (#18)

#22

Radosław Smogura

rsmogura@softperience.eu

almost 15 years ago

In reply to: Tom Lane (#20)