Storing large files in multiple schemas: BLOB or BYTEA

Started by Nonameover 13 years ago9 messagesgeneral
Jump to latest
#1Noname
tigran2-postgres@riatest.com

Hi,

I need to store large files (from several MB to 1GB) in Postgres database.
The database has multiple schemas. It looks like Postgres has 2 options to
store large objects: LOB and BYTEA. However we seem to hit problems with
each of these options.

1. LOB. This works almost ideal, can store up to 2GB and allows streaming so
that we do not hit memory limits in our PHP backend when reading the LOB.
However all blobs are stored in pg_catalog and are not part of schema. This
leads to a big problem when you try to use pg_dump with options -n and -b to
dump just one schema with its blobs. It dumps the schema data correctly
however then it include ALL blobs in the database not just the blobs that
belong to the particular schema.

Is there a way to dump the single schema with its blobs using pg_dump or
some other utility?

2. BYTEA. These are correctly stored per schema so pg_dump -n works
correctly however I cannot seem to find a way to stream the data. This means
that there is no way to access the data from PHP if it is larger than memory
limit.

Is there any other way to store large data in Postgres that allows streaming
and correctly works with multiple schemas per database?

Thanks.

(Sorry if this double-posts on pgsql-php, I did not know which is the best
list for this question).

#2Shaun Thomas
sthomas@optionshouse.com
In reply to: Noname (#1)
Re: Storing large files in multiple schemas: BLOB or BYTEA

On 10/10/2012 05:16 AM, tigran2-postgres@riatest.com wrote:

I need to store large files (from several MB to 1GB) in Postgres
database. The database has multiple schemas. It looks like Postgres
has 2 options to store large objects: LOB and BYTEA. However we seem
to hit problems with each of these options.

I believe the general consensus around here is to not do that, if you
can avoid it. File systems are much better equipped to handle files of
that magnitude, especially when it comes to retrieving them, scanning
through their contents, or really, any access pattern aside from simple
storage.

You're better off storing the blob on disk somewhere and storing a row
that refers to its location. Either key pieces for a naming scheme or
the full path.

This is especially true if you mean to later access that data with PHP.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Noname (#1)
Re: Storing large files in multiple schemas: BLOB or BYTEA

tigran2-postgres@riatest.com wrote:

Is there any other way to store large data in Postgres that allows

streaming and correctly works with

multiple schemas per database?

Large Objects and bytea are the only ways.

If you want to pg_dump only certain large objects, that won't work
as far as I know (maybe using permissions and a non-superuser can help).

You absolutely need to pg_dump parts of the database regularly?

Yours,
Laurenz Albe

#4Tigran Najaryan
tigran.najaryan@riatest.com
In reply to: Laurenz Albe (#3)
Re: Storing large files in multiple schemas: BLOB or BYTEA

I believe the general consensus around here is to not do that, if you can

avoid it. File systems are much better equipped to handle files of that
magnitude, especially when it comes to retrieving them, scanning >through
their contents, or really, any access pattern aside from simple storage.

You're better off storing the blob on disk somewhere and storing a row that

refers to its location. Either key pieces for a naming scheme or the full
path.

This is especially true if you mean to later access that data with PHP.

--

Shaun Thomas

Using files stored outside the database creates all sorts of problems. For
starters you lose ACID guaranties. I would prefer to keep them in database.
We did a lot of experiments with Large Objects and they really worked fine
(stored hundreds of LOBs ranging from a few MB up to 1GB). Postgres does a
really good job with Large Objects. If it was not the pg_dump problem I
would not hesitate to use LOBs.

#5Noname
tigran2-postgres@riatest.com
In reply to: Tigran Najaryan (#4)
Re: Storing large files in multiple schemas: BLOB or BYTEA

Large Objects and bytea are the only ways.

If you want to pg_dump only certain large objects, that won't work as far

as I know (maybe using permissions and a non-superuser can help).

You absolutely need to pg_dump parts of the database regularly?

Yours,

Laurenz Albe

It is not an absolute requirement but would be really nice to have. We have
a multi-tenant database with each tenant data stored in a separate scheme.
Using pg_dump seems to be the ideal way to migrate tenant data from one
database to another when we need to do it to balance the load.

#6Noname
tigran2-postgres@riatest.com
In reply to: Noname (#5)
Re: Storing large files in multiple schemas: BLOB or BYTEA

I believe the general consensus around here is to not do that, if you can

avoid it. File systems are much better equipped to handle files of that
magnitude, especially when it comes to retrieving them, scanning >through
their contents, or really, any access pattern aside from simple storage.

You're better off storing the blob on disk somewhere and storing a row that

refers to its location. Either key pieces for a naming scheme or the full
path.

This is especially true if you mean to later access that data with PHP.

--

Shaun Thomas

Using files stored outside the database creates all sorts of problems. For
starters you lose ACID guaranties. I would prefer to keep them in database.
We did a lot of experiments with Large Objects and they really worked fine
(stored hundreds of LOBs ranging from a few MB up to 1GB). Postgres does a
really good job with Large Objects. If it was not the pg_dump problem I
would not hesitate to use LOBs.

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Noname (#6)
Re: Storing large files in multiple schemas: BLOB or BYTEA

On 10/11/2012 01:35 PM, tigran2-postgres@riatest.com wrote:

Using files stored outside the database creates all sorts of problems.
For starters you lose ACID guaranties. I would prefer to keep them in
database. We did a lot of experiments with Large Objects and they really
worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB).
Postgres does a really good job with Large Objects. If it was not the
pg_dump problem I would not hesitate to use LOBs.

Yeah, a pg_dump mode that dumped everything but large objects would be
nice.

Right now I find storing large objects in the DB such a pain from a
backup management point of view that I avoid it where possible.

I'm now wondering about the idea of implementing a pg_dump option that
dumped large objects into a directory tree like
lobs/[loid]/[lob_md5]
and wrote out a restore script that loaded them using `lo_import`.

During dumping temporary copies could be written to something like
lobs/[loid]/.tmp.nnnn with the md5 being calculated on the fly as the
byte stream is read. If the dumped file had the same md5 as the existing
one it'd just delete the tempfile; otherwise the tempfile would be
renamed to the calculated md5.

That way incremental backup systems could manage the dumped LOB tree
without quite the same horrible degree of duplication as is currently
faced when using lo in the database with pg_dump.

A last_modified timestamp on `pg_largeobject_metadata` would be even
better, allowing the cost of reading and discarding rarely-changed large
objects to be avoided.

--
Craig Ringer

#8Chris Travers
chris.travers@gmail.com
In reply to: Craig Ringer (#7)
Re: Storing large files in multiple schemas: BLOB or BYTEA

On Wed, Oct 10, 2012 at 10:56 PM, Craig Ringer <ringerc@ringerc.id.au>wrote:

On 10/11/2012 01:35 PM, tigran2-postgres@riatest.com wrote:

Using files stored outside the database creates all sorts of problems.
For starters you lose ACID guaranties. I would prefer to keep them in
database. We did a lot of experiments with Large Objects and they really
worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB).
Postgres does a really good job with Large Objects. If it was not the
pg_dump problem I would not hesitate to use LOBs.

Yeah, a pg_dump mode that dumped everything but large objects would be
nice.

Remembering when that was the only way pg_dump worked and it caused plenty
of problems.

But yeah, --only-lobs and --no-lobs might be nice switches.

Right now I find storing large objects in the DB such a pain from a backup
management point of view that I avoid it where possible.

I'm now wondering about the idea of implementing a pg_dump option that
dumped large objects into a directory tree like
lobs/[loid]/[lob_md5]
and wrote out a restore script that loaded them using `lo_import`.

Thinking of the problems that occurred when we used to require lobs to be
backed up to binary archive formats....

During dumping temporary copies could be written to something like
lobs/[loid]/.tmp.nnnn with the md5 being calculated on the fly as the byte
stream is read. If the dumped file had the same md5 as the existing one
it'd just delete the tempfile; otherwise the tempfile would be renamed to
the calculated md5.

That way incremental backup systems could manage the dumped LOB tree
without quite the same horrible degree of duplication as is currently faced
when using lo in the database with pg_dump.

How do incremental backup systems work with lots of data anyway with
pg_dump? I would think thats not the approach I would take to incremental
backups and PostgreSQL....

A last_modified timestamp on `pg_largeobject_metadata` would be even
better, allowing the cost of reading and discarding rarely-changed large
objects to be avoided.

It might be interesting to look at the issue of large objects from a total
backup perspective. I do wonder though where the end would be. You
could have 500MB text fields and those might pose backup issues as well. I
suppose with better LOB support on backups, it would give additional
options for management.

Best Wishes,
Chris Travers

#9Wolfgang Keller
feliphil@gmx.net
In reply to: Shaun Thomas (#2)
SQL DATALINK (Re: Storing large files in multiple schemas: BLOB or BYTEA)

I need to store large files (from several MB to 1GB) in Postgres
database. The database has multiple schemas. It looks like Postgres
has 2 options to store large objects: LOB and BYTEA. However we seem
to hit problems with each of these options.

I believe the general consensus around here is to not do that, if you
can avoid it. File systems are much better equipped to handle files
of that magnitude, especially when it comes to retrieving them,
scanning through their contents, or really, any access pattern aside
from simple storage.

You're better off storing the blob on disk somewhere and storing a
row that refers to its location. Either key pieces for a naming
scheme or the full path.

What's the state of implementation (planning) for this?:

http://wiki.postgresql.org/wiki/DATALINK

Sincerely,

Wolfgang