Storing images as BYTEA or large objects
Hi,
I would like to store binary data in a PostgreSQL database. The size of
the data is about 2 to 20 MB and is always stored or retrieved as a
block (i.e., I do not need to get only part of the data). As I
understand, I have two options for storing this data: As BYTEA or as
large objects. As I understand, the disadvantage of the first method is
that I need to write a custom routine to escape some binary values - or
is there some code available to do this? The disadvantage of large
objects is their existence outside of the main database, which may be
problematic when backing up a database. In addition, I need special
routines to store/retrieve the data.
My two questions are: Is this summary correct? And: Which method should
I choose?
Best,
Koen
Hi,
I would like to store binary data in a PostgreSQL database. The size of
the data is about 2 to 20 MB and is always stored or retrieved as a
block (i.e., I do not need to get only part of the data). As I
understand, I have two options for storing this data: As BYTEA or as
large objects. As I understand, the disadvantage of the first method is
that I need to write a custom routine to escape some binary values - or
is there some code available to do this? The disadvantage of large
objects is their existence outside of the main database, which may be
problematic when backing up a database. In addition, I need special
routines to store/retrieve the data.My two questions are: Is this summary correct? And: Which method should
I choose?Best,
Koen---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
With Java , Python , Perl you've got functions that escapes the data for
you
Leonel
Koen Vermeer wrote:
Hi,
I would like to store binary data in a PostgreSQL database. The size of
the data is about 2 to 20 MB and is always stored or retrieved as a
block (i.e., I do not need to get only part of the data). As I
understand, I have two options for storing this data: As BYTEA or as
large objects. As I understand, the disadvantage of the first method is
that I need to write a custom routine to escape some binary values - or
is there some code available to do this? The disadvantage of large
objects is their existence outside of the main database, which may be
problematic when backing up a database. In addition, I need special
routines to store/retrieve the data.My two questions are: Is this summary correct? And: Which method should
I choose?Best,
Koen
Having used the large objects, I can tell you they do backup (pg_dump,
etc) and they are not hard to use. There is even a contrib that helps
you hook them up to a table so they get deleted/etc at appropriate times
(I have not used it though so not sure 100%) (they were easy enough to
use even without the contrib).
The only reason I used lo was to avoid the escaping. I'm not sure how
slow escaping a 20 meg file would be, but it just sounded slow. And you
have to do it twice, once to send it, and once when you get it back.
I'd love to hear from others about that... is it something I should even
worry about? We store lots of photos in the db, they usually run a meg
or less, but we store lots of 'em.
-Andy
I have two options for storing this data: As BYTEA or as large objects.
Is it true that if you update a row containing a large BYTEA value, (even if
you're not updating the BYTEA field itself, just another field), it requires
the entire BYTEA value to be copied to a new row (because of MVCC) ? Or is
this not true because of TOAST?
If true, would this have an impact on the buffer cache and/or checkpoints ?
(You could always separate out the BYTEA values to their own table by
themselves to avoid this drawback)
"Adam Rich" <adam.r@sbcglobal.net> writes:
I have two options for storing this data: As BYTEA or as large objects.
Is it true that if you update a row containing a large BYTEA value, (even if
you're not updating the BYTEA field itself, just another field), it requires
the entire BYTEA value to be copied to a new row (because of MVCC) ? Or is
this not true because of TOAST?
It is not true. As long as you don't change the toasted value, it
contributes nothing much to the cost of updating the row. All that
has to be copied is a 20-byte pointer structure.
However, if you *do* change the toasted value, it is always updated as a
unit. So if you have use for writing into portions of a large value,
large objects are what you want.
regards, tom lane
Hi,
Which programming language are you using?
Regards,
Gevik Babakhani
------------------------------------------------
PostgreSQL NL http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl
------------------------------------------------
Show quoted text
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer
Sent: Tuesday, February 12, 2008 11:41 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Storing images as BYTEA or large objectsHi,
I would like to store binary data in a PostgreSQL database.
The size of the data is about 2 to 20 MB and is always stored
or retrieved as a block (i.e., I do not need to get only part
of the data). As I understand, I have two options for storing
this data: As BYTEA or as large objects. As I understand, the
disadvantage of the first method is that I need to write a
custom routine to escape some binary values - or is there
some code available to do this? The disadvantage of large
objects is their existence outside of the main database,
which may be problematic when backing up a database. In
addition, I need special routines to store/retrieve the data.My two questions are: Is this summary correct? And: Which
method should I choose?Best,
Koen---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?
On Wed, 2008-02-13 at 07:37 +0100, Gevik Babakhani wrote:
Which programming language are you using?
That would be C++ for storing and both C++ and PHP for retrieving the
data. Maybe also PL/SQL for retrieval (in addition to or instead of
PHP).
Koen
On Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote:
My two questions are: Is this summary correct? And: Which method should
I choose?With Java , Python , Perl you've got functions that escapes the data for
you
What about C++ and PHP?
Koen
On Tue, 2008-02-12 at 21:14 -0600, Andy Colson wrote:
Having used the large objects, I can tell you they do backup (pg_dump,
etc) and they are not hard to use. There is even a contrib that helps
you hook them up to a table so they get deleted/etc at appropriate times
(I have not used it though so not sure 100%) (they were easy enough to
use even without the contrib).
In my case, I do not really worry about cleaning them up, as I expect
not to delete the objects (or only very rarely) anyway. But it's good to
know (I hope I still remember it in a few months) that there is a
contrib that handles this.
The only reason I used lo was to avoid the escaping. I'm not sure how
slow escaping a 20 meg file would be, but it just sounded slow. And you
have to do it twice, once to send it, and once when you get it back.
I thought about that too, and didn't find conclusive arguments to choose
one or the other. So I decided to just ask, but the jury is still out...
Best,
Koen
In hour case we where switching between databases so what I have done in the
past was:
For inserting:
1. create a TEXT column in my table. (In PG this can be 1GB in size)
2. read file contents in a buffer/string and Base64 encode that string.
3. write the string into db.
For reading:
1. read the text column into a buffer/string Base64 decode to get your
original image/binary data.
The approach above worked for us. This was a PHP5 website and C# frontend.
Regards,
Gevik Babakhani
------------------------------------------------
PostgreSQL NL http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl
------------------------------------------------
Show quoted text
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer
Sent: Wednesday, February 13, 2008 9:43 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Storing images as BYTEA or large objectsOn Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote:
My two questions are: Is this summary correct? And: Which method
should I choose?With Java , Python , Perl you've got functions that
escapes the data
for you
What about C++ and PHP?
Koen
---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
On Wed, 2008-02-13 at 09:57 +0100, Gevik Babakhani wrote:
In hour case we where switching between databases so what I have done in the
past was:
For inserting:
1. create a TEXT column in my table. (In PG this can be 1GB in size)
2. read file contents in a buffer/string and Base64 encode that string.
3. write the string into db.
For reading:
1. read the text column into a buffer/string Base64 decode to get your
original image/binary data.
The approach above worked for us. This was a PHP5 website and C# frontend.
Instead of base64 encoding, I guess it would be easier to just escape
the required bytes and store them in a bytea.
But, just to be clear: I think I can get both methods (lo or bytea) to
work. All I was trying to find out is which option would be better in my
case.
Best,
Koen
Koen Vermeer wrote:
Hi,
I would like to store binary data in a PostgreSQL database. The size of
the data is about 2 to 20 MB and is always stored or retrieved as a
block (i.e., I do not need to get only part of the data). As I
understand, I have two options for storing this data: As BYTEA or as
large objects. As I understand, the disadvantage of the first method is
that I need to write a custom routine to escape some binary values - or
is there some code available to do this? The disadvantage of large
objects is their existence outside of the main database, which may be
problematic when backing up a database. In addition, I need special
routines to store/retrieve the data.
I've used both methods. The only real problem is that none of the
trigger based replication schemes
such as Slony can't deal with large objects.
Depending on what programming language you're using you do *not* need to
escape the binary
data for BYTEA. Using libpq from C/C++ you can pass the binary data
straight into the database. See
PQexecParams :
http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN
We use server-side JavaScript here that makes use of that interface so I
can take an image directly
from the web-server and move it into the database with no escape overhead.
My preference : if I don't need the file-like interface to large objects
I'd use BYTEA every time.
Pete
--
http://www.whitebeam.org - JavaScript web application server.
http://www.yellowhawk.co.uk
Show quoted text
My two questions are: Is this summary correct? And: Which method should
I choose?Best,
Koen---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
Koen Vermeer wrote:
Instead of base64 encoding, I guess it would be easier to just escape
the required bytes and store them in a bytea.
Actually, if you have access to the pqExecParams() call, you can pass
the bytes to a bytea column unescaped, which AFAIK saves some processing
on both the client and server.
To get the bytes back unescaped, I think you could use a binary cursor.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote:
I've used both methods. The only real problem is that none of the
trigger based replication schemes
such as Slony can't deal with large objects.
I can live with that for now. If the project ever gets that big, I
probably need to rethink some design choices anyway.
Depending on what programming language you're using you do *not* need to
escape the binary
data for BYTEA. Using libpq from C/C++ you can pass the binary data
straight into the database. See
PQexecParams :
http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN
We use server-side JavaScript here that makes use of that interface so I
can take an image directly
from the web-server and move it into the database with no escape overhead.
Ah, I see, that seems to be a better way indeed. I can probably do
something like that with PHP as well.
My preference : if I don't need the file-like interface to large objects
I'd use BYTEA every time.
Right, so that basically means that when 'large objects' are files,
which should be saved and restored as a whole, it may be more natural to
use the large objects. I guess that applies to some uses of media
storage (music, photos, video).
The large-objects-are-actually-files thing applies to my situation, so
unless there is some 'large objects are / will be deprecated' argument,
I guess I stick with large objects.
Thanks!
Koen
Koen Vermeer wrote:
The large-objects-are-actually-files thing applies to my situation, so
unless there is some 'large objects are / will be deprecated' argument,
I guess I stick with large objects.
Certainly there is no such argument.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Koen Vermeer wrote:
On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote:
My preference : if I don't need the file-like interface to large objects
I'd use BYTEA every time.Right, so that basically means that when 'large objects' are files,
which should be saved and restored as a whole, it may be more natural to
use the large objects. I guess that applies to some uses of media
storage (music, photos, video).
No - I meant the Postgres large object interface allows you to read and
write sections of a
large object. It provides a seek/read/write interface.
If you're only going to read or write the whole contents as a single
block then use BYTEA. In
my case I store uploaded images as BYTEA - I only every need to
read/write the image as a whole.
If you were dealing with very large images/music/video in a web
environment then I could see a
web application wanting to read a chunk - write to the web client - read
next chunk etc and thus
avoid the overhead of the entire contents being in memory at one time.
That probably doesn't
help with upload though.
Pete
--
http://www.whitebeam.org - JavaScript web application server.
http://www.yellowhawk.co.uk
The large-objects-are-actually-files thing applies to my situation, so
unless there is some 'large objects are / will be deprecated' argument,
I guess I stick with large objects.Thanks!
Koen
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
------------------------------------------------------------------------
Peter Wilson
T: 01707 891840
M: 07796 656566
http://www.yellowhawk.co.uk The information in this email is
confidential and is intended for the addressee/s only. Access to this
email by anyone else is unauthorised. If you are not the intended
recipient, you must not read, use or disseminate the information
contained in or attached to this email.
Op woensdag 13-02-2008 om 15:21 uur [tijdzone +0000], schreef Peter
Wilson:
Right, so that basically means that when 'large objects' are files,
which should be saved and restored as a whole, it may be more natural to
use the large objects. I guess that applies to some uses of media
storage (music, photos, video).No - I meant the Postgres large object interface allows you to read and
write sections of a
large object. It provides a seek/read/write interface.
If you're only going to read or write the whole contents as a single
block then use BYTEA. In
my case I store uploaded images as BYTEA - I only every need to
read/write the image as a whole.
If you were dealing with very large images/music/video in a web
environment then I could see a
web application wanting to read a chunk - write to the web client - read
next chunk etc and thus
avoid the overhead of the entire contents being in memory at one time.
That probably doesn't
help with upload though.
In my case, the web application only downloads the data. Upload is done
through a custom, local application. So I guess BYTEA would work fine
for me. I'll do some more reading to see if any of its other properties
wouldn't match my needs and if both BYTEA and large objects seem to
suite me, I'll go for BYTEA.
Thanks for the help!
Koen
Op woensdag 13-02-2008 om 10:45 uur [tijdzone -0300], schreef Alvaro
Herrera:
Instead of base64 encoding, I guess it would be easier to just escape
the required bytes and store them in a bytea.Actually, if you have access to the pqExecParams() call, you can pass
the bytes to a bytea column unescaped, which AFAIK saves some processing
on both the client and server.
To get the bytes back unescaped, I think you could use a binary cursor.
I'll check to see what the options are for reading in the data in PHP.
Thanks for the help!
Best,
Koen
On Feb 13, 2008, at 2:53 PM, Koen Vermeer wrote:
I'll check to see what the options are for reading in the data in PHP.
Thanks for the help!
If you use prepared statements, you don't need to do anything special
at all for bytea with PHP. No worries about escaping and all that.
Using the schema below and a simple prepared statement API (http://pgedit.com/resource/php/pgfuncall
), I can insert/load documents with a single line like:
$db->blob_insert($content);
$content = $db->blob_content($this->object_ref);
John DeSoi, Ph.D.
--
-- blobs
--
create table blob (
dbid serial primary key,
content bytea
);
create or replace function blob_insert(p_content bytea)
returns integer as $$
declare
new_dbid integer = nextval(pg_get_serial_sequence('blob', 'dbid'));
begin
insert into blob (dbid, content) values (new_dbid, p_content);
return new_dbid;
end;
$$ language plpgsql;
create or replace function blob_update(p_dbid integer, p_content bytea)
returns integer as $$
begin
update blob set content = p_content where dbid = p_dbid;
if found then
return 1;
else
return 0;
end if;
end;
$$ language plpgsql;
create or replace function blob_content(p_dbid integer)
returns bytea as $$
declare
v_content bytea;
begin
select into v_content content from blob where dbid = p_dbid;
return v_content;
end;
$$ language plpgsql;