text should be a blob field

Started by Zeugswetter Andreas SARZalmost 28 years ago19 messages
#1Zeugswetter Andreas SARZ
Andreas.Zeugswetter@telecom.at

Of course a first mail after a release must have the earned praise:
Very nicely done, I like it all ;-)

Reading the TODO, I see 'Allow text, char(), and varchar() overhead to be
only 2 bytes, not 4 bytes'
While this is very good for char and varchar, text is not a candidate
since it is usually a blob datatype, without a length restriction (or a 2Gig
limit).

I think it should alternately read:
Allow varchar() overhead to be only 2 bytes
remove char() 4 byte overhead, use atttypmod instead
make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

What do you think ?
Andreas

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Zeugswetter Andreas SARZ (#1)
Re: [HACKERS] text should be a blob field

Of course a first mail after a release must have the earned praise:
Very nicely done, I like it all ;-)

Reading the TODO, I see 'Allow text, char(), and varchar() overhead to be
only 2 bytes, not 4 bytes'
While this is very good for char and varchar, text is not a candidate
since it is usually a blob datatype, without a length restriction (or a 2Gig
limit).

It was an idea. I think I will remove it from the TODO list. I had
considered it so I could save the defined length(atttypmod now) in
there, but now that we have atttypmod, we don't need it. It will stay
at 4 bytes.

I think it should alternately read:
Allow varchar() overhead to be only 2 bytes
remove char() 4 byte overhead, use atttypmod instead

Ooh, this is interesting. Yea, I guess we really don't need that for
char() anymore. The only problem is that we would have to do some fancy
stuff to track char() separately in the backend, and I am sure atttypmod
is not available in all the places we need it. Don't think it is worth
it.

make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

Hmmm.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#3Goran Thyni
goran@bildbasen.se
In reply to: Bruce Momjian (#2)
Re: [HACKERS] text should be a blob field

make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

Hmmm.

I know Informix has 2 BLOB-types "text" and "binary"
but I do not think we should change the meaning of keyword "text" too
much.

An idea to think about:
if text fits within a tuple make it varchar-alias (as it works now)
if text is larger make it a blob.

Or simply call text-BLOBs "textblob" of something like that.
What does SQL-92 say about BLOBs anyway?

regards,
--
---------------------------------------------
G�ran Thyni, sysadm, JMS Bildbasen, Kiruna

#4Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#2)
Re: [HACKERS] text should be a blob field

Or simply call text-BLOBs "textblob" of something like that.
What does SQL-92 say about BLOBs anyway?

Nothing afaik. That is why you get different meanings and usages between database
products. I'd like to keep "text" as a useful string type. Conventionally, generic
blobs are just binary objects with not much backend support (e.g. no useful
operators other than perhaps "=").

Imo generic blobs make more sense in a system without the capability to add types;
perhaps a solution for Postgres would look a little different. At the moment, the
frontend/backend protocol is different for large objects and everything else, so
it would be difficult to transparently introduce blobs which behave identically to
types which fit within a normal tuple.

- Tom

#5Zeugswetter Andreas SARZ
Andreas.Zeugswetter@telecom.at
In reply to: Thomas G. Lockhart (#4)
AW: [HACKERS] text should be a blob field

Or simply call text-BLOBs "textblob" of something like that.
What does SQL-92 say about BLOBs anyway?

Nothing afaik. That is why you get different meanings and usages between
database
products. I'd like to keep "text" as a useful string type. Conventionally,
generic
blobs are just binary objects with not much backend support (e.g. no
useful
operators other than perhaps "=").

Imo generic blobs make more sense in a system without the capability to
add types;
perhaps a solution for Postgres would look a little different. At the
moment, the
frontend/backend protocol is different for large objects and everything
else, so
it would be difficult to transparently introduce blobs which behave
identically to
types which fit within a normal tuple.

- Tom

Yup, that all sounds very plausible. But, since the meaning diverges between
DB Systems
I would suggest to maybe not enforce text for now (at least not in system
tables).
It has almost the same behavior as varchar (does it ?), and since varchar is
very good now :-)
I would enforce the use of varchar where it fits (like passwd in pg_shadow,
but not prosrc in pg_proc where text is appropriate).
Maybe just to keep the doors open for a larger text datatype in the future.

Andreas

#6Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Bruce Momjian (#2)
Re: [HACKERS] text should be a blob field

On Tue, 3 Mar 1998, Bruce Momjian wrote:

make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

Hmmm.

There was some talk about this about a month ago.

Although we now have blob support in the JDBC driver, there is one
outstanding issue with them, that I was waiting for 6.3 to be released
before starting on it (and almost certainly starting a discussion here
about it).

Allowing text to use blobs for values larger than the current block size
would hit the same problem.

Ok, here's what the problem is at the moment:

The JDBC example ImageViewer uses a table to store the name of an image,
and the OID of the associated blob.

# create table images (imgname name,imgoid oid);

Ok, we now create an entry in the table for an image with:

# insert into images values ('test.gif',lo_import('/home/pmount/test.gif'));

This is fine so far. Now say we delete that row with:

# delete from images where name = 'test.gif';

Fine again, except that the blob is still in the database. To get round
this, you would have to add extra statements to handle this, and for JDBC,
there is no standard way to do this.

What I was thinking of, was to create a new type 'blob' which would delete
the associated large object when the row is deleted. However, here's the
problems against this:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

2. When we update a row, we don't want the overhead of copying a very
large blob when a row is first copied, then the original deleted, etc.

Anyhow, I'm thinking of various ways around this - just don't hold your
breath ;-)

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Peter T Mount (#6)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

outstanding issue with them, that I was waiting for 6.3 to be released
before starting on it (and almost certainly starting a discussion here
about it).

Allowing text to use blobs for values larger than the current block size
would hit the same problem.

Ok, here's what the problem is at the moment:

The JDBC example ImageViewer uses a table to store the name of an image,
and the OID of the associated blob.

# create table images (imgname name,imgoid oid);

Ok, we now create an entry in the table for an image with:

# insert into images values ('test.gif',lo_import('/home/pmount/test.gif'));

This is fine so far. Now say we delete that row with:

# delete from images where name = 'test.gif';

Fine again, except that the blob is still in the database. To get round
this, you would have to add extra statements to handle this, and for JDBC,
there is no standard way to do this.

What I was thinking of, was to create a new type 'blob' which would delete
the associated large object when the row is deleted. However, here's the
problems against this:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

Well, you could have a RULE that deletes the large object at row
deletion time. However, if two rows point to the same large object, the
first one deleting it would delete the large object for the other. The
only solution to this is to have a separate large object table, and use
reference counts so only the last user of the object deletes it.

2. When we update a row, we don't want the overhead of copying a very
large blob when a row is first copied, then the original deleted, etc.

Again, a deletion-only rule, but if the update the row and change the
large object, you would have to delete the old stuff.

Seems very messy to me. Perhaps put all the large objects in a table,
and have a process clean up all the unreferenced large objects.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#8Noname
jwieck@debis.com
In reply to: Bruce Momjian (#7)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Bruce wrote:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

Well, you could have a RULE that deletes the large object at row
deletion time. However, if two rows point to the same large object, the
first one deleting it would delete the large object for the other. The
only solution to this is to have a separate large object table, and use
reference counts so only the last user of the object deletes it.

I think triggers are more appropriate.

On INSERT check that the large object referenced exists.

On UPDATE if large object reference changes, check that new
large object exists and check if old large object isn't
referenced any more in which case drop the old large object.

On DELETE check if large object isn't referenced any more ...

Yes - I like triggers :-)

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#9Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Bruce Momjian (#7)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Wed, 4 Mar 1998, Bruce Momjian wrote:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

Well, you could have a RULE that deletes the large object at row
deletion time.

As I haven't yet played with Rules & Triggers, and now we have 6.3 out of
the way, I'm going to start.

However, if two rows point to the same large object, the first one
deleting it would delete the large object for the other. The only
solution to this is to have a separate large object table, and use
reference counts so only the last user of the object deletes it.

Ah, in this case, there would be a single large object per column/row. If
the row is deleted, then so will the blob.

2. When we update a row, we don't want the overhead of copying a very
large blob when a row is first copied, then the original deleted, etc.

Again, a deletion-only rule, but if the update the row and change the
large object, you would have to delete the old stuff.

That's true.

Seems very messy to me. Perhaps put all the large objects in a table,
and have a process clean up all the unreferenced large objects.

I think that would be a last resort thing to use.

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#10Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Peter T Mount (#6)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Peter T Mount wrote:

On Tue, 3 Mar 1998, Bruce Momjian wrote:

make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

There was some talk about this about a month ago.

Although we now have blob support in the JDBC driver, there is one
outstanding issue with them, that I was waiting for 6.3 to be released
before starting on it (and almost certainly starting a discussion here
about it).

Allowing text to use blobs for values larger than the current block size
would hit the same problem.

When I told about multi-representation feature I ment that applications
will not be affected by how text field is stored - in tuple or somewhere
else. Is this Ok for you ?

Vadim

#11Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Vadim B. Mikheev (#10)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Thu, 5 Mar 1998, Vadim B. Mikheev wrote:

Peter T Mount wrote:

On Tue, 3 Mar 1998, Bruce Momjian wrote:

make text a blob datatype (maybe storing <= 8k row with tuple, >=8k in blob
tablespace)

There was some talk about this about a month ago.

Although we now have blob support in the JDBC driver, there is one
outstanding issue with them, that I was waiting for 6.3 to be released
before starting on it (and almost certainly starting a discussion here
about it).

Allowing text to use blobs for values larger than the current block size
would hit the same problem.

When I told about multi-representation feature I ment that applications
will not be affected by how text field is stored - in tuple or somewhere
else. Is this Ok for you ?

Yes. What I was meaning was if the "somewhere else" is in a blob, then we
would have to keep track of it if the tuple is updated or deleted.

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#12Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Noname (#8)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Wed, 4 Mar 1998, Jan Wieck wrote:

Bruce wrote:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

Well, you could have a RULE that deletes the large object at row
deletion time. However, if two rows point to the same large object, the
first one deleting it would delete the large object for the other. The
only solution to this is to have a separate large object table, and use
reference counts so only the last user of the object deletes it.

I think triggers are more appropriate.

On INSERT check that the large object referenced exists.

On UPDATE if large object reference changes, check that new
large object exists and check if old large object isn't
referenced any more in which case drop the old large object.

On DELETE check if large object isn't referenced any more ...

Yes - I like triggers :-)

I'm begining to agree with you here.

So far, I've got the trigger to work, so if a row of a table is deleted,
or an oid referencing a BLOB is updated, then the old BLOB is deleted.
This removes the orphaned BLOB problem.

The only problem I have now, is:

How to get a trigger to be automatically created on a table when the
table is created. This would be required, so the end user doesn't have
to do this (normally from within an application).

This would be required, esp. for expanding the text type (or memo, or
whatever).

Any Ideas?

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#13Maurice Gittens
mgittens@gits.nl
In reply to: Peter T Mount (#12)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

-----Original Message-----
From: Peter T Mount <psqlhack@maidast.demon.co.uk>
To: Jan Wieck <jwieck@debis.com>
Cc: Bruce Momjian <maillist@candle.pha.pa.us>;
Andreas.Zeugswetter@telecom.at <Andreas.Zeugswetter@telecom.at>;
pgsql-hackers@hub.org <pgsql-hackers@hub.org>
Date: zondag 15 maart 1998 21:10
Subject: Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Wed, 4 Mar 1998, Jan Wieck wrote:

Bruce wrote:

1. Is there a call made by the backend to each datatype when a row is
deleted? I can't see one.

Well, you could have a RULE that deletes the large object at row
deletion time. However, if two rows point to the same large object,

the

first one deleting it would delete the large object for the other. The
only solution to this is to have a separate large object table, and use
reference counts so only the last user of the object deletes it.

I think triggers are more appropriate.

On INSERT check that the large object referenced exists.

On UPDATE if large object reference changes, check that new
large object exists and check if old large object isn't
referenced any more in which case drop the old large object.

On DELETE check if large object isn't referenced any more ...

Yes - I like triggers :-)

I'm begining to agree with you here.

So far, I've got the trigger to work, so if a row of a table is deleted,
or an oid referencing a BLOB is updated, then the old BLOB is deleted.
This removes the orphaned BLOB problem.

The only problem I have now, is:

How to get a trigger to be automatically created on a table when the
table is created. This would be required, so the end user doesn't have
to do this (normally from within an application).

This would be required, esp. for expanding the text type (or memo, or
whatever).

Since triggers are not inherited this doesn't seem appropiate too me -:(.
Won't user have to do some magic on inherited tables?

I think many things will be fixed when triggers, indices etc. are inherited
properly by derived classes.

With regards from,
Maurice.

#14Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Maurice Gittens (#13)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Since triggers are not inherited this doesn't seem appropiate to me
-:(. Won't user have to do some magic on inherited tables?

I think many things will be fixed when triggers, indices etc. are
inherited properly by derived classes.

This is an interesting point. We have perhaps been neglecting to discuss
how inheritance should behave with new features. Certainly at least some
things (like default values and constraints) do inherit as you would
expect. It's not clear to me that inheriting indices is appropriate,
since indices are basically just an artifact for performance
improvement; the results of a query do not change with the presence of
an index. Also, if a table is inherited, then the access pattern of the
resulting table might be different, requiring a different index anyway.

I haven't worked with triggers; what is their current behavior with
inheritance? Maurice implies that they are not inherited at all. If not,
should they be?

- Tom

#15Maurice Gittens
mgittens@gits.nl
In reply to: Thomas G. Lockhart (#14)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

-----Original Message-----
From: Thomas G. Lockhart <lockhart@alumni.caltech.edu>
To: PostgreSQL-development <hackers@postgresql.org>
Cc: Maurice Gittens <mgittens@gits.nl>
Date: zondag 15 maart 1998 22:24
Subject: Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Since triggers are not inherited this doesn't seem appropiate to me
-:(. Won't user have to do some magic on inherited tables?

I think many things will be fixed when triggers, indices etc. are
inherited properly by derived classes.

This is an interesting point. We have perhaps been neglecting to discuss
how inheritance should behave with new features. Certainly at least some
things (like default values and constraints) do inherit as you would
expect. It's not clear to me that inheriting indices is appropriate,
since indices are basically just an artifact for performance
improvement;

Yes I agree.

the results of a query do not change with the presence of
an index. Also, if a table is inherited, then the access pattern of the
resulting table might be different, requiring a different index anyway.

In general OO terms code reuse (in our case trigger reuse? -:) ) is achieved
by inheriting as much as sanely possible.
Since (from an OO perspective) every instance of a derived class is also an
instance of it's base class(es), inheriting indices seems like a natural
thing
to me.

However it may be possible that an OO database makes different choices
than an OO programming language.

I haven't worked with triggers; what is their current behavior with
inheritance? Maurice implies that they are not inherited at all. If not,
should they be?

Well if we want to support polymorphism the answer to this question is a
definite yes.

In general any operation defined on an instance of a base class should be
defined on an instance of a derived class. This is the main reason
we inherit in the first place. The user has the option of redefining
(specializing)
an operation for instances of derived classes if she/he so pleases.

I think proper support for this would be really cool.

With regards from Maurice.

#16Noname
jwieck@debis.com
In reply to: Peter T Mount (#12)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Peter Mount wrote:

On Wed, 4 Mar 1998, Jan Wieck wrote:

I think triggers are more appropriate.

I'm begining to agree with you here.

So far, I've got the trigger to work, so if a row of a table is deleted,
or an oid referencing a BLOB is updated, then the old BLOB is deleted.
This removes the orphaned BLOB problem.

The only problem I have now, is:

How to get a trigger to be automatically created on a table when the
table is created. This would be required, so the end user doesn't have
to do this (normally from within an application).

This would be required, esp. for expanding the text type (or memo, or
whatever).

So you think of a new type that automatically causes trigger
definition if used in CREATE/ALTER TABLE.

Agree - would be a nice feature.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#17Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Noname (#16)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Mon, 16 Mar 1998, Jan Wieck wrote:

Peter Mount wrote:

On Wed, 4 Mar 1998, Jan Wieck wrote:

I think triggers are more appropriate.

I'm begining to agree with you here.

So far, I've got the trigger to work, so if a row of a table is deleted,
or an oid referencing a BLOB is updated, then the old BLOB is deleted.
This removes the orphaned BLOB problem.

The only problem I have now, is:

How to get a trigger to be automatically created on a table when the
table is created. This would be required, so the end user doesn't have
to do this (normally from within an application).

This would be required, esp. for expanding the text type (or memo, or
whatever).

So you think of a new type that automatically causes trigger
definition if used in CREATE/ALTER TABLE.

Agree - would be a nice feature.

Exactly, it would be a nice feature.

I'm about to look at rules to see if that's a way to do it, but seeing it
took me about 30 mins to do this with Triggers (and thats when I've never
used them before), then it would be nice to use these.

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#18Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Peter T Mount (#17)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

Would it be possible to have a slightly different interface in the
frontend library which hides the fact that large objects are transfered
8kb at a time from the backend? Then the handling of text and large
objects/blobs starts to look more alike...

- Tom

#19Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Thomas G. Lockhart (#18)
Re: [QUESTIONS] Re: [HACKERS] text should be a blob field

On Tue, 17 Mar 1998, Thomas G. Lockhart wrote:

Would it be possible to have a slightly different interface in the
frontend library which hides the fact that large objects are transfered
8kb at a time from the backend? Then the handling of text and large
objects/blobs starts to look more alike...

The front end doesn't show the 8k limit... the storage manager handles
splitting up the large object into 8k chunks - it may be that the examples
show this because we know about it ourselves ;-)

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk