RE: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

Started by Peter Mountover 25 years ago3 messages
#1Peter Mount
petermount@it.maidstone.gov.uk

See below...

--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council

-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Friday, August 04, 2000 2:29 AM
To: Tom Lane
Cc: pgsql-hackers@postgreSQL.org; pgsql-general@postgreSQL.org
Subject: Re: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT
(optional!)

At 21:10 3/08/00 -0400, Tom Lane wrote:

As well as break the semantics: if you have a multiply-referenced BLOB
then you can update it through any reference and the changes are visible
through all the references. Not so after you convert the data into
non-BLOB values.

That's what I meant. People *shouldn't* expect BLOB fields to be updated in
more than one table, but the implementation currently allow it (since BLOBs
are not implemented as fields).

Peter: I dissagree. There are dozens of instances where you would use a
single BLOB but refer to it in more than one table. If you have a 1Mb blob
refered to in 3 different tables, you don't want to store 3 instances of it.
Say you were implementing some form of DIP system (Document Image
Processing), then you only want one copy of the document stored, so that if
that document changes, then every instance is changed.

I don't see that pg_dump can help meaningfully,
and I'd just as soon resist feature bloat in pg_dump.

Fine. Thinking about it, even *if* it was implemented as a utility, I
suspect (for the reasons you outlined), conversion would be a multi-step
process. And a more useful utility would be one that converted an existing
database, rather than trying to everything in the 'restore'...

Peter: It might be useful to have the utility and put it under contrib. It
would then save people from reinventing the wheel.

Forget I even mentioned it.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#2Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Peter Mount (#1)
Re: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote:

See below...

Peter: I dissagree. There are dozens of instances where you would use a
single BLOB but refer to it in more than one table. If you have a 1Mb blob
refered to in 3 different tables, you don't want to store 3 instances of it.
Say you were implementing some form of DIP system (Document Image
Processing), then you only want one copy of the document stored, so that if
that document changes, then every instance is changed.

But Peter, the relational way to avoid redundant storage should apply. For
every other type, one does this by storing the data in one place, with
a unique ID, and using the ID to refer to the data item, and joining when
you need the item itself.

So, once large data items are promoted to first class types, they should
act just like every other first class type. Otherwise, we violate the
principle of least surprise. Having software that tries to second guess
the developer is always frustrating.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Ross J. Reedstrom (#2)
Re: [HACKERS] pg_dump/restore to convert BLOBs to LZTEXT (optiona l!)

On Fri, Aug 04, 2000 at 07:55:52AM +0100, Peter Mount wrote:

See below...

Peter: I dissagree. There are dozens of instances where you would use a
single BLOB but refer to it in more than one table. If you have a 1Mb blob
refered to in 3 different tables, you don't want to store 3 instances of it.
Say you were implementing some form of DIP system (Document Image
Processing), then you only want one copy of the document stored, so that if
that document changes, then every instance is changed.

But Peter, the relational way to avoid redundant storage should apply. For
every other type, one does this by storing the data in one place, with
a unique ID, and using the ID to refer to the data item, and joining when
you need the item itself.

So, once large data items are promoted to first class types, they should
act just like every other first class type. Otherwise, we violate the
principle of least surprise. Having software that tries to second guess
the developer is always frustrating.

I totally agree. Because large objects exist aas separate file, this
was required, but after TOAST, the proper relational way should be used.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026