Database versus filesystem for storing images

Started by Clodoaldoover 19 years ago45 messagesgeneral

clodoaldo.pinto.neto@gmail.com

over 19 years ago

A web application will store hundreds of thousands images.

The images meta data will be stored in postgresql but I'm in doubt
about where to store the images themselves, if in the database as byte
strings or in the file system.

Would one of them have a faster image search than the other?

In the database the images would be indexed by a serial id, that of
the file submitter, and an image number. They would always be searched
by the same key.

In the file system there would be a two or three level directory
structure and the directory names would be hexadecimal numbers like
'/images_directory/f/0/d' taken from the 2/3 first letters of the file
name, which would be a hash.

Whenever I want to know the hash name of an image so I can find it in
the file system I would have to search it first in the database by the
index. So I suppose the database will always be faster.

But there is an advantage in using the file system. It is that the
Apache's HTTP file caching management is there for free. If I store
the images in the database I will have to handle the request and
response HTTP headers myself if I want the images to be cached by the
clients.

What is the best practice in this situation? I mean not only the two
options as above but any.

Regards,
--
Clodoaldo Pinto Neto

Guy Rouillier

guyr-ml1@burntmail.com

over 19 years ago

In reply to: Clodoaldo (#1)

Re: Database versus filesystem for storing images

Clodoaldo wrote:

A web application will store hundreds of thousands images.

<snip>

What is the best practice in this situation? I mean not only the two
options as above but any.

This discussion comes up regularly. See the archives for a thread
titled "Storing images in PostgreSQL databases (again)" for the latest.

--
Guy Rouillier

Scott Ribe

scott_ribe@killerbytes.com

over 19 years ago

In reply to: Clodoaldo (#1)

Re: Database versus filesystem for storing images

Personally, I'd put them on the file system, because then backup software
can perform incremental backups. In the database, that becomes more of a
difficulty. One suggestion, don't use a file name from a hash to store the
image, just use the serial id, and break them up by hundreds or thousands,
iow image 1123 might be in images/000/000001/000001123.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

imageguy

imageguy1206@gmail.com

over 19 years ago

In reply to: Scott Ribe (#3)

Re: Database versus filesystem for storing images

Scott Ribe wrote:

Personally, I'd put them on the file system, because then backup software
can perform incremental backups. In the database, that becomes more of a
difficulty. One suggestion, don't use a file name from a hash to store the
image, just use the serial id, and break them up by hundreds or thousands,
iow image 1123 might be in images/000/000001/000001123.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

I think I know the answer, but if you don't have an "application
server" - ie a webserver, etc, and many of the workstations/clients
that need access to the images but may not have access to a network
share, isn't the database the only choice ?

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

Geoff.

Clodoaldo

clodoaldo.pinto.neto@gmail.com

over 19 years ago

In reply to: imageguy (#4)

Re: Database versus filesystem for storing images

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,
--
Clodoaldo Pinto Neto

Jeremy Haile

jhaile@fastmail.fm

over 19 years ago

In reply to: Clodoaldo (#5)

Re: Database versus filesystem for storing images

It's almost always better to store the images on the file system and
just store the filename or relative path in the database.

This is more efficient, doesn't bloat the database by storing files in
it, and is easier to get proper browser caching behavior (depending on
how your app is setup). I try to avoid BLOBs whenever possible.

Cheers,
Jeremy Haile

On Fri, 5 Jan 2007 17:18:10 -0200, "Clodoaldo"
<clodoaldo.pinto.neto@gmail.com> said:

Show quoted text

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,
--
Clodoaldo Pinto Neto

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

John McCawley

nospam@hardgeus.com

over 19 years ago

In reply to: Clodoaldo (#5)

Re: Database versus filesystem for storing images

Don't store your images in the database. Store them on the filesystem
and store their path in the database. Anyone that tells you otherwise
is a stark raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.
Meanwhile our images/documents storage is well over a hundred gigs. I'd
hate to think that I'd have to dump and restore 100 gigs every time I
wanted to dump the newest data to the development database.

As far as how they actually get to the client machine, typically these
days people use web servers for this sort of thing.

Clodoaldo wrote:

Show quoted text

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,

James Neff

jneff@tethyshealth.com

over 19 years ago

In reply to: John McCawley (#7)

Re: Database versus filesystem for storing images

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

As an application developer, I like the idea of storing files and images
in the database because it makes it much easier to control access and
security from an application standpoint.

I think Microsoft SQL Server stores blobs in a separate file, and only
retains pointers in the actually database field for that blob. So when
you SELECT on that blob MS SQL reads the external file for you as if it
lived in the database. I don't know if Postgres does the same thing,
but if it did, you wouldn't have to worry about "bloating" database files.

Sounds like this is for an Apache web application. Think about how web
sites like Flickr and Webshots store their images in a database. You
could write a cool Apache mod so that the url:
"http://mycompany.com/images/01234.jpg" would go through this module,
pull the appropriate image from the database and send it back; all the
while the client is none-the-wiser. Just a thought.

I think its one of those things where there's not right or wrong
answer. Instead you just have to do the minimum of what your
application requires. If you don't need application-level control over
the files, then by all means store them on the file system. But if you
need to control security than you have to prevent physical access to the
file (which means no file system storage) and pull the image from the
database through the application.

My two cents,
James

John McCawley wrote:

Show quoted text

Don't store your images in the database. Store them on the filesystem
and store their path in the database. Anyone that tells you otherwise
is a stark raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.
Meanwhile our images/documents storage is well over a hundred gigs.
I'd hate to think that I'd have to dump and restore 100 gigs every
time I wanted to dump the newest data to the development database.

As far as how they actually get to the client machine, typically these
days people use web servers for this sort of thing.
Clodoaldo wrote:

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Jorge Godoy

jgodoy@gmail.com

over 19 years ago

In reply to: John McCawley (#7)

Re: Database versus filesystem for storing images

John McCawley <nospam@hardgeus.com> writes:

Don't store your images in the database. Store them on the filesystem and
store their path in the database. Anyone that tells you otherwise is a stark
raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs. Meanwhile
our images/documents storage is well over a hundred gigs. I'd hate to think
that I'd have to dump and restore 100 gigs every time I wanted to dump the
newest data to the development database.

How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored? I mean, how do you handle integrity with data
outside the database?

--
Jorge Godoy <jgodoy@gmail.com>

#10

Andrew Chernow

pg-job@esilo.com

over 19 years ago

In reply to: Jorge Godoy (#9)

Re: Database versus filesystem for storing images

Don't store your images in the database. Store them on the filesystem and
store their path in the database

I 100% agree. Use the database as a lookup into the filesystem. Don't load the
database up with terabytes of non-searchable binary data? not sure how that
would help you?

Here is one idea:

Have a 64-bit sequence that you use to generate an image_id (becomes file name).
Hash that id out over a 3-level deep directory structure that allows 4096
entries per directory. Should give you 64-bit worth of files.

CREATE OR REPLACE FUNCTION get_image_path(image_id BIGINT)
RETURNS TEXT AS $$
DECLARE
-- 40963, avoids "integer out of range"
dir1_val BIGINT := 68719476736;
dir1 BIGINT;
dir2 BIGINT;
dir3 BIGINT;
path TEXT;

BEGIN
dir1 := (image_id / dir1_val) % 4096;
dir2 := (image_id / (4096 * 4096)) % 4096;
dir3 := (image_id / 4096) % 4096;
RETURN '/BASE_PATH/' || dir1 || '/' || dir2 ||
'/' || dir3 || '/' || image_id;
END;
$$ LANGUAGE PLPGSQL;

test=# select get_image_path(200399322222);
get_image_path
-------------------------------------
/BASE_PATH/2/3752/2991/200399322222
(1 row)

I mean, how do you handle integrity with data
outside the database?

You don't, the file system handles integrity of the stored data. Although, one
must careful to avoid db and fs orphans. Meaning, a record with no
corresponding file or a file with no corresponging record. Always
write()/insert an image file to the system within a transaction, including
writing the image out to the fs. Make sure to unlink any paritally written
image files.

How do you plan your backup routine

In regards to backup, backup the files one-by-one. Grab the lastest image file
refs from the database and start backing up those images. Each successfully
backed up image should be followed by inserting that file's database record into
a remote db server. If anything fails, cleanup the partial image file (to avoid
orphaned data) and rollout the transaction.

just one idea. i'm sure there are other ways of doing it. point is, this is
completely possible to do reliably.

andrew

Jorge Godoy wrote:

Show quoted text

John McCawley <nospam@hardgeus.com> writes:

Don't store your images in the database. Store them on the filesystem and
store their path in the database. Anyone that tells you otherwise is a stark
raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs. Meanwhile
our images/documents storage is well over a hundred gigs. I'd hate to think
that I'd have to dump and restore 100 gigs every time I wanted to dump the
newest data to the development database.

How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored? I mean, how do you handle integrity with data
outside the database?

#11

John McCawley

nospam@hardgeus.com

over 19 years ago

In reply to: Jorge Godoy (#9)

Re: Database versus filesystem for storing images

This is a web app, so in my example all of the images live on a web
server, and our data lives on a separate database server. We have a
completely duplicated setup offsite, and mirror images of every server
at the backup site. Every night we use rsync to duplicate everything
offsite. Also, a cron job pg_dumps every night and copies the dump over
to the backup DB server.

And before anybody gives me any guff, our office is in New Orleans, and
we went through Katrina with less than an hour of downtime, and without
losing anything. So there ;)

Jorge Godoy wrote:

Show quoted text

John McCawley <nospam@hardgeus.com> writes:

Don't store your images in the database. Store them on the filesystem and
store their path in the database. Anyone that tells you otherwise is a stark
raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs. Meanwhile
our images/documents storage is well over a hundred gigs. I'd hate to think
that I'd have to dump and restore 100 gigs every time I wanted to dump the
newest data to the development database.

How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored? I mean, how do you handle integrity with data
outside the database?

#12

Jeanna Geier

jgeier@apt-cafm.com

over 19 years ago

In reply to: James Neff (#8)

Re: Database versus filesystem for storing images

We use WebDAV and Apache's Slide to store our images and, as someone pointed out earlier, store the links to the images in our database.

WebDAV has provided us with excellent access control and security...
http://www.webdav.org/
http://jakarta.apache.org/slide/index.html

Just my 1/2 cents,
-Jeanna

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of James Neff
Sent: Friday, January 05, 2007 2:27 PM
To: John McCawley
Cc: Clodoaldo; imageguy; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database versus filesystem for storing images

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

As an application developer, I like the idea of storing files and images
in the database because it makes it much easier to control access and
security from an application standpoint.

My two cents,
James

John McCawley wrote:

Don't store your images in the database. Store them on the filesystem
and store their path in the database. Anyone that tells you otherwise
is a stark raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.
Meanwhile our images/documents storage is well over a hundred gigs.
I'd hate to think that I'd have to dump and restore 100 gigs every
time I wanted to dump the newest data to the development database.

As far as how they actually get to the client machine, typically these
days people use web servers for this sort of thing.
Clodoaldo wrote:

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#13

John McCawley

nospam@hardgeus.com

over 19 years ago

In reply to: James Neff (#8)

Re: Database versus filesystem for storing images

My comment about stark raving madmen was somewhat tongue-in-cheek.
There is certainly a case to be made for images in a database under some
circumstances. However, for the average Joe web developer, and
certainly someone new to the whole problem, I think storing them on the
filesystem is going to be by far the most painless and efficient approach.

James Neff wrote:

Show quoted text

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

As an application developer, I like the idea of storing files and
images in the database because it makes it much easier to control
access and security from an application standpoint.

I think Microsoft SQL Server stores blobs in a separate file, and only
retains pointers in the actually database field for that blob. So
when you SELECT on that blob MS SQL reads the external file for you as
if it lived in the database. I don't know if Postgres does the same
thing, but if it did, you wouldn't have to worry about "bloating"
database files.

Sounds like this is for an Apache web application. Think about how
web sites like Flickr and Webshots store their images in a database.
You could write a cool Apache mod so that the url:
"http://mycompany.com/images/01234.jpg" would go through this module,
pull the appropriate image from the database and send it back; all the
while the client is none-the-wiser. Just a thought.

I think its one of those things where there's not right or wrong
answer. Instead you just have to do the minimum of what your
application requires. If you don't need application-level control
over the files, then by all means store them on the file system. But
if you need to control security than you have to prevent physical
access to the file (which means no file system storage) and pull the
image from the database through the application.

My two cents,
James

#14

Jeremy Haile

jhaile@fastmail.fm

over 19 years ago

In reply to: James Neff (#8)

Re: Database versus filesystem for storing images

How does it make it easier to control access and security? If your web
app makes a decision about allowing access to the database, it can just
as easily make a decision about allowing access to the filesystem.
Storing the images on the file system doesn't mean that there isn't a
piece of code that determines whether or not users can access a
particular file.

I see security and access as a non-issue in making this decision.
Either way, it's got to be controlled at the application level (if it is
even necessary).

On Fri, 05 Jan 2007 15:26:45 -0500, "James Neff"
<jneff@tethyshealth.com> said:

Show quoted text

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

As an application developer, I like the idea of storing files and images
in the database because it makes it much easier to control access and
security from an application standpoint.

I think Microsoft SQL Server stores blobs in a separate file, and only
retains pointers in the actually database field for that blob. So when
you SELECT on that blob MS SQL reads the external file for you as if it
lived in the database. I don't know if Postgres does the same thing,
but if it did, you wouldn't have to worry about "bloating" database
files.

Sounds like this is for an Apache web application. Think about how web
sites like Flickr and Webshots store their images in a database. You
could write a cool Apache mod so that the url:
"http://mycompany.com/images/01234.jpg" would go through this module,
pull the appropriate image from the database and send it back; all the
while the client is none-the-wiser. Just a thought.

I think its one of those things where there's not right or wrong
answer. Instead you just have to do the minimum of what your
application requires. If you don't need application-level control over
the files, then by all means store them on the file system. But if you
need to control security than you have to prevent physical access to the
file (which means no file system storage) and pull the image from the
database through the application.

My two cents,
James

John McCawley wrote:

Don't store your images in the database. Store them on the filesystem
and store their path in the database. Anyone that tells you otherwise
is a stark raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.
Meanwhile our images/documents storage is well over a hundred gigs.
I'd hate to think that I'd have to dump and restore 100 gigs every
time I wanted to dump the newest data to the development database.

As far as how they actually get to the client machine, typically these
days people use web servers for this sort of thing.
Clodoaldo wrote:

5 Jan 2007 06:59:18 -0800, imageguy <imageguy1206@gmail.com>:

I think I know the answer,

If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.

but if you don't have an "application
server" - ie a webserver, etc,

Yes I have an application server, the Apache server.

and many of the workstations/clients
that need access to the images but may not have access to a network
share,

network share? I don't understand. The images will be loaded by html
pages with the img tag like in <img
src="http://domain.com/images/xxx.jpg">

isn't the database the only choice ?

No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.

- or is there a postgresql function/utility that will "server" the
file from the file system based on the reference/link embeded in the
database ??

I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#15

Jorge Godoy

jgodoy@gmail.com

over 19 years ago

In reply to: John McCawley (#11)

Re: Database versus filesystem for storing images

John McCawley <nospam@hardgeus.com> writes:

This is a web app, so in my example all of the images live on a web server,
and our data lives on a separate database server. We have a completely
duplicated setup offsite, and mirror images of every server at the backup
site. Every night we use rsync to duplicate everything offsite. Also, a cron
job pg_dumps every night and copies the dump over to the backup DB server.

And before anybody gives me any guff, our office is in New Orleans, and we
went through Katrina with less than an hour of downtime, and without losing
anything. So there ;)

Anyway, you have no guarantee that all your images exist on file and that all
existing files have a corresponding entry in your database.

--
Jorge Godoy <jgodoy@gmail.com>

#16

Jorge Godoy

jgodoy@gmail.com

over 19 years ago

In reply to: Andrew Chernow (#10)

Re: Database versus filesystem for storing images

Andrew Chernow <pg-job@esilo.com> writes:

I mean, how do you handle integrity with data
outside the database?

You don't, the file system handles integrity of the stored data. Although,
one must careful to avoid db and fs orphans. Meaning, a record with no
corresponding file or a file with no corresponging record. Always
write()/insert an image file to the system within a transaction, including
writing the image out to the fs. Make sure to unlink any paritally written
image files.

And how do you guarantee that after a failure? You're restoring two
different sets of data here:

- backup from your database
- backup from your files

How do you link them together on that specific operation? Or even on a daily
basis, if you get corrupted data...

How do you plan your backup routine

In regards to backup, backup the files one-by-one. Grab the lastest image
file refs from the database and start backing up those images. Each
successfully backed up image should be followed by inserting that file's
database record into a remote db server. If anything fails, cleanup the
partial image file (to avoid orphaned data) and rollout the transaction.

just one idea. i'm sure there are other ways of doing it. point is, this is
completely possible to do reliably.

Wouldn't replication with, e.g., Slony be easier? And wouldn't letting the
database handle all the integrity be easier? I mean, create an "images" table
and then make your record depends on this table, so if there's no record with
the image, you won't have any references to it left.

It would also make the backup plan easier: backup the database.

Not counting that depending on your choice of filesystem and image size you
might get a very poor performance.

--
Jorge Godoy <jgodoy@gmail.com>

#17

Andrew Chernow

pg-job@esilo.com

over 19 years ago

In reply to: Jorge Godoy (#16)

Re: Database versus filesystem for storing images

And how do you guarantee that after a failure? You're restoring two
different sets of data here:

How do you link them together on that specific operation? Or even on a daily
basis, if you get corrupted data...

I answered that already.

Not counting that depending on your choice of filesystem and image size you
might get a very poor performance.

apache has very good page and image caching. You could take advantage of that
using this technique.

Another nice feature is the database and images can be handled spearately. Some
people have seen this as a disadvantage on this thread, I personally don't see
it that why.

I guess it depends on access needs, many files and how much data you have. What
if you had 3 billion files across a few hundred terabytes? Can you say with
experience how the database would hold up in this situation?

andrew

Jorge Godoy wrote:

Show quoted text

Andrew Chernow <pg-job@esilo.com> writes:

I mean, how do you handle integrity with data
outside the database?

You don't, the file system handles integrity of the stored data. Although,
one must careful to avoid db and fs orphans. Meaning, a record with no
corresponding file or a file with no corresponging record. Always
write()/insert an image file to the system within a transaction, including
writing the image out to the fs. Make sure to unlink any paritally written
image files.

And how do you guarantee that after a failure? You're restoring two
different sets of data here:

- backup from your database
- backup from your files

How do you link them together on that specific operation? Or even on a daily
basis, if you get corrupted data...

How do you plan your backup routine

In regards to backup, backup the files one-by-one. Grab the lastest image
file refs from the database and start backing up those images. Each
successfully backed up image should be followed by inserting that file's
database record into a remote db server. If anything fails, cleanup the
partial image file (to avoid orphaned data) and rollout the transaction.

just one idea. i'm sure there are other ways of doing it. point is, this is
completely possible to do reliably.

Wouldn't replication with, e.g., Slony be easier? And wouldn't letting the
database handle all the integrity be easier? I mean, create an "images" table
and then make your record depends on this table, so if there's no record with
the image, you won't have any references to it left.

It would also make the backup plan easier: backup the database.

Not counting that depending on your choice of filesystem and image size you
might get a very poor performance.

#18

Bruno Wolff III

bruno@wolff.to

over 19 years ago

In reply to: James Neff (#8)

Re: Database versus filesystem for storing images

On Fri, Jan 05, 2007 at 15:26:45 -0500,
James Neff <jneff@tethyshealth.com> wrote:

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

I think the main reason to keep images in the database is if you need
transactional semantics. If you are updating images and transactions that
started before the update, need to see the old version you are going to
want them in the database. I suspect this need isn't very common though.

#19

Jeremy Haile

jhaile@fastmail.fm

over 19 years ago

In reply to: Bruno Wolff III (#18)

Re: Database versus filesystem for storing images

Yeah - it can make it easier to implement transactional semantics by
storing them in the database, although for simple operations it wouldn't
be hard to replicate this manually. And you are going to incur a
performance penalty by storing them in the database.

Another thing to consider is that storing them in the file system makes
it much easier to browse the images using third-party tools, update
them, archive them (by gzipping or whatever). This is much more
difficult if they are stored in the database.

On Fri, 5 Jan 2007 15:51:59 -0600, "Bruno Wolff III" <bruno@wolff.to>
said:

Show quoted text

On Fri, Jan 05, 2007 at 15:26:45 -0500,
James Neff <jneff@tethyshealth.com> wrote:

"... and Moses said unto them, 'The eleventh commandment : thou shalt
store images in a database!'..."

What if you had another database where you stored just the images and
not back it up if you don't want to?

I think the main reason to keep images in the database is if you need
transactional semantics. If you are updating images and transactions that
started before the update, need to see the old version you are going to
want them in the database. I suspect this need isn't very common though.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#20

Scott Marlowe

smarlowe@g2switchworks.com

over 19 years ago

In reply to: Jeremy Haile (#19)

Re: Database versus filesystem for storing images

On Fri, 2007-01-05 at 15:54, Jeremy Haile wrote:

Yeah - it can make it easier to implement transactional semantics by
storing them in the database, although for simple operations it wouldn't
be hard to replicate this manually. And you are going to incur a
performance penalty by storing them in the database.

Another thing to consider is that storing them in the file system makes
it much easier to browse the images using third-party tools, update
them, archive them (by gzipping or whatever). This is much more
difficult if they are stored in the database.

The biggest performance penalty from them being in the same database as
your data is that they're going to be the majority of your kernel /
database buffers. So, as mentioned earlier, it's almost a necessity
(for performance reasons) to put them into their own db server.

#21

Jorge Godoy

jgodoy@gmail.com

over 19 years ago

In reply to: Andrew Chernow (#17)

#22

Jorge Godoy

jgodoy@gmail.com

over 19 years ago

In reply to: Jeremy Haile (#19)

#23

John McCawley