Best practice for file storage?

Started by Joe Kramerabout 16 years ago5 messagesgeneral
Jump to latest
#1Joe Kramer
cckramer@gmail.com

Hi,

I need to store a lot of large files (thousands of 10-100 MB files)
uploaded through my web application and I find that storing them in
database as bytea field is not practical for backup purposes.
My database has full backup performed every 12 hours and backup is
encrypted and copied to server on another continent. Having all the
heavy binary data in database will make backups impossible.

So I am thinking of having a table just for metadata and file id
sequence and storing the file on file system:

CREATE TABLE business_logo
(
file_id bigserial NOT NULL,
file_name varchar,
file_date timestamp,
);

Storing file in path composed from serial id, e.g. file with id 2345
will be stored in
/webapp/files/2/3/4/5/2345

So I can backup files separately and database backup is still quick
and painless.

This is very simplistic and straightforward method.
I suppose there are better ways of doing it, using some virtual file system?
Anyone had a similar issue with avoiding of storing large files in
database, how did you solve it?

Thanks.

#2Mark Morgan Lloyd
markMLl.pgsql-general@telemetry.co.uk
In reply to: Joe Kramer (#1)
Re: Best practice for file storage?

Joe Kramer wrote:

Hi,

I need to store a lot of large files (thousands of 10-100 MB files)
uploaded through my web application and I find that storing them in
database as bytea field is not practical for backup purposes.
My database has full backup performed every 12 hours and backup is
encrypted and copied to server on another continent. Having all the
heavy binary data in database will make backups impossible.

So I am thinking of having a table just for metadata and file id
sequence and storing the file on file system:

Wasn't one of the particle accelerator establishments using PostgreSQL
in this way for their raw data?

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Joe Kramer (#1)
Re: Best practice for file storage?

On 31/01/2010 6:46 PM, Joe Kramer wrote:

Hi,

I need to store a lot of large files (thousands of 10-100 MB files)
uploaded through my web application and I find that storing them in
database as bytea field is not practical for backup purposes.
My database has full backup performed every 12 hours and backup is
encrypted and copied to server on another continent. Having all the
heavy binary data in database will make backups impossible.

So I am thinking of having a table just for metadata and file id
sequence and storing the file on file system:

CREATE TABLE business_logo
(
file_id bigserial NOT NULL,
file_name varchar,
file_date timestamp,
);

Storing file in path composed from serial id, e.g. file with id 2345
will be stored in
/webapp/files/2/3/4/5/2345

Consider going a bit easier on your path component splitting. A given
directory only containing 10 subdirs gets expensive in dir traversal
(and is a nightmare to work with for admins); it'll probably be much
better to use 23/45/2345 or even bigger chunks.

This depends a lot on the file system, so testing it is probably best.
ext3 with dir_index shouldn't need much in the way of such manipulation
at all - I regularly store tens of thousands of files in a single
directory without issues.

--
Craig Ringer

#4Andy Colson
andy@squeakycode.net
In reply to: Joe Kramer (#1)
Re: Best practice for file storage?

On 01/31/2010 04:46 AM, Joe Kramer wrote:

Hi,

I need to store a lot of large files (thousands of 10-100 MB files)
uploaded through my web application and I find that storing them in
database as bytea field is not practical for backup purposes.
My database has full backup performed every 12 hours and backup is
encrypted and copied to server on another continent. Having all the
heavy binary data in database will make backups impossible.

So I am thinking of having a table just for metadata and file id
sequence and storing the file on file system:

CREATE TABLE business_logo
(
file_id bigserial NOT NULL,
file_name varchar,
file_date timestamp,
);

Storing file in path composed from serial id, e.g. file with id 2345
will be stored in
/webapp/files/2/3/4/5/2345

So I can backup files separately and database backup is still quick
and painless.

This is very simplistic and straightforward method.
I suppose there are better ways of doing it, using some virtual file system?
Anyone had a similar issue with avoiding of storing large files in
database, how did you solve it?

Thanks.

A bonus you get from using a standard file system is rsync'able backups. I'd bet most of those thousands of files dont change that often? Your backup times with rsync will be outstanding. If you use dark magic, you may be limiting your backup options.

And +1 with Craig, I've also stored thousands of files in the same dir (using XFS) and it was not slower than splitting them into smaller subdir's. (it wasnt faster, but it wasnt slower either)

-Andy

#5Steve Atkins
steve@blighty.com
In reply to: Joe Kramer (#1)
Re: Best practice for file storage?

On Jan 31, 2010, at 2:46 AM, Joe Kramer wrote:

Hi,

I need to store a lot of large files (thousands of 10-100 MB files)
uploaded through my web application and I find that storing them in
database as bytea field is not practical for backup purposes.
My database has full backup performed every 12 hours and backup is
encrypted and copied to server on another continent. Having all the
heavy binary data in database will make backups impossible.

So I am thinking of having a table just for metadata and file id
sequence and storing the file on file system:

CREATE TABLE business_logo
(
file_id bigserial NOT NULL,
file_name varchar,
file_date timestamp,
);

Storing file in path composed from serial id, e.g. file with id 2345
will be stored in
/webapp/files/2/3/4/5/2345

So I can backup files separately and database backup is still quick
and painless.

This is very simplistic and straightforward method.
I suppose there are better ways of doing it, using some virtual file system?
Anyone had a similar issue with avoiding of storing large files in
database, how did you solve it?

Works fine, though you probably want more than 10 entries in each directory,
depending on which filesystem you use.

If you're going to delete the files ever, you can do that transactionally by
having a trigger on the table that queues the filenames in a table for
deletion by an external process.

If you can store some information about the file contents - size at least,
maybe a cheap hash (md5) - then you'll find it easier to sanity check
filesystem vs database when something goes wrong. It also lets you
find duplicate files more easily.

Cheers,
Steve