Storing Video's or vedio file in DB.

Started by VENKTESH GUTTEDARover 11 years ago14 messagesgeneral
Jump to latest
#1VENKTESH GUTTEDAR
venkteshguttedar@gmail.com

Hello,

I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.

I am working on a application where i will get video files from mobile
app, and i have to store in the server,
so my question is how do i store video's or video files in DB, or do i
need to store only the link of that video file, if yes then where i have to
store the file and how to access that file to display in an html page.

if anyone is helping me out then give me all details related to storing
videos in db, not just hints. like (what datatype i have to use.? how to
encode and decode.? how to accept files through http post)

Help would be appreciated.

Thank you.

--
Regards :
Venktesh Guttedar.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: VENKTESH GUTTEDAR (#1)
Re: Storing Video's or vedio file in DB.

VENKTESH GUTTEDAR wrote:

I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.

I am working on a application where i will get video files from mobile app, and i have to store in the server,
so my question is how do i store video's or video files in DB, or do i need to store only the link
of that video file, if yes then where i have to store the file and how to access that file to display
in an html page.

if anyone is helping me out then give me all details related to storing videos in db, not just
hints. like (what datatype i have to use.? how to encode and decode.? how to accept files through http
post)

I cannot help you with the web end of things (wrong forum), but
on the database side you have, as you said, the choice between
binary data in the database or files on the file system and links to them in the database.

I'll tell you the pros and cons.
If you store data in the database, you don't have to worry about consistency,
which will make development simpler.
The downside is that you might end up with a huge database that you will have
to backup and maintain, and the performance will be worse than reading files
from the file system.

I'd consider
- how big are the files
- how many of them are there
- whether performance is critical
- how much time you can invest in development
- how critical is consistency

The data type to use would be "bytea", unless the files are very large or you need
support for streaming in the database, in which case "large objects" would have
advantages. The big downsides of large objects are that they are not as simple as
bytea and you'll have to maintain consistency between large objects and references
to them in tables.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Vincent Veyron
vv.lists@wanadoo.fr
In reply to: VENKTESH GUTTEDAR (#1)
Re: Storing Video's or vedio file in DB.

On Wed, 17 Dec 2014 15:09:40 +0530
VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote:

so my question is how do i store video's or video files in DB, or do i
need to store only the link of that video file, if yes then where i have to
store the file and how to access that file to display in an html page.

If you store your file in the file system, all you need is to store the path to it. Display a link to the file in your html, and the server will deliver the file to the user's web server, which in turn will open it with the proper program on the user's machine.

I do it with all sorts of extensions (docx, xlxs, pdf...), it works very well.

--
Salutations, Vincent Veyron

https://marica.fr/
Gestion des contentieux, des dossiers de sinistres assurance et des contrats pour le service juridique

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Thomas Kellerer
spam_eater@gmx.net
In reply to: Laurenz Albe (#2)
Re: Storing Video's or vedio file in DB.

Albe Laurenz schrieb am 17.12.2014 um 11:07:

and the performance will be worse than reading files from the file system.

There is a Microsoft research [1]http://research.microsoft.com/apps/pubs/default.aspx?id=64525 (from 2006) which tested this "myth" using SQL Server.
It showed that the database might actually be faster than the file system.

As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to test this on Postgres and possibly other DBMS.

Turns out the Postgres as well isn't really slower at this than the file system.

For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs using the PK of the table.

"Reading from the filesystem" means looking up the path for the file in the database table and then reading the file from the filesystem.

For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms reading the file from disk.

The tests were done locally on my Windows laptop.
I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on the figures and NTFS is not known for being blazingly fast. So maybe those figures will change.

My tests however do not take into account the actual time it takes to send the binary data from the server to the client (=browser). It might well be possible that serving the file through an Apache Web Server directly is faster than serving the file through a JEE Servlet. My intention was to measure the raw read speed of the binary data from the medium where it is stored.

The downside is that you might end up with a huge database
that you will have to backup and maintain

I don't really buy the argument with the backup: the amount of data to be backed up is essentially the same.
With both solutions you can have incremental backups.

Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly.
Having thousands or even millions of files in a single directory is not going to be maintenance friendly either.

Regards
Thomas

[1]: http://research.microsoft.com/apps/pubs/default.aspx?id=64525

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Alban Hertroys
haramrae@gmail.com
In reply to: Thomas Kellerer (#4)
Re: Storing Video's or vedio file in DB.

On 17 December 2014 at 13:55, Thomas Kellerer <spam_eater@gmx.net> wrote:

Albe Laurenz schrieb am 17.12.2014 um 11:07:

and the performance will be worse than reading files from the file system.

There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server.
It showed that the database might actually be faster than the file system.

As this topic comes up at my workplace every now and then as well, I created a little web application (Java/JDBC) to test this on Postgres and possibly other DBMS.

Turns out the Postgres as well isn't really slower at this than the file system.

For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs using the PK of the table.

"Reading from the filesystem" means looking up the path for the file in the database table and then reading the file from the filesystem.

With how many blobs/files did you test this? I'm asking because PG
stores all blobs in a single table. On a file-system, if all files are
stored in a single directory, the situation is similar.
However, a file-system has the ability to store files in several
directories instead of just one, which is often claimed to improve
file-locating performance.

Seeing as the read performance of a file (once it's been located) from
the file-system versus a blob appears similar, the difference in time
for locating the file might well be relevant here.

Interesting to see this was tested with MS SQL and therefore limited
to NTFS. It's probably useful to test this with other file-systems,
such as ZFS or UFS (with DIRHASH!), etc.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Thomas Kellerer (#4)
Re: Storing Video's or vedio file in DB.

Thomas Kellerer wrote:

Albe Laurenz schrieb am 17.12.2014 um 11:07:

and the performance will be worse than reading files from the file system.

There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server.
It showed that the database might actually be faster than the file system.

As this topic comes up at my workplace every now and then as well, I created a little web application
(Java/JDBC) to test this on Postgres and possibly other DBMS.

Turns out the Postgres as well isn't really slower at this than the file system.

For small files around 50k both perform similar: the average time to read the blob from a bytea column
was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test
uses 50 threads to read the blobs using the PK of the table.

"Reading from the filesystem" means looking up the path for the file in the database table and then
reading the file from the filesystem.

For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column
vs. 260ms reading the file from disk.

The tests were done locally on my Windows laptop.
I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on
the figures and NTFS is not known for being blazingly fast. So maybe those figures will change.

That must be some strangeness of the web application, que no?
PostgreSQL must do everything that a direct file access does, right?
Plus some extra processing (load the data into shared_buffers, ...).
Given that, do you have any explanation for what you observed?

My tests however do not take into account the actual time it takes to send the binary data from the
server to the client (=browser). It might well be possible that serving the file through an Apache Web
Server directly is faster than serving the file through a JEE Servlet. My intention was to measure the
raw read speed of the binary data from the medium where it is stored.

Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the difference?

The downside is that you might end up with a huge database
that you will have to backup and maintain

I don't really buy the argument with the backup: the amount of data to be backed up is essentially the
same.
With both solutions you can have incremental backups.

The amount to back up will stay roughly the same, granted.
But isn't backup/restore of a large database more cumbersome than backup/restore of a file system?
And a major upgrade of a large database is more painful, right?

Another downside you didn't mentioned is the fact that you have to distribute the files in the
filesystem properly.
Having thousands or even millions of files in a single directory is not going to be maintenance
friendly either.

That's right, you have to spend some thought on how to store the files as well.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Vincent Veyron
vv.lists@wanadoo.fr
In reply to: Thomas Kellerer (#4)
Re: Storing Video's or vedio file in DB.

On Wed, 17 Dec 2014 13:55:46 +0100
Thomas Kellerer <spam_eater@gmx.net> wrote:

Another downside you didn't mentioned is the fact that you have to distribute the files in the filesystem properly.
Having thousands or even millions of files in a single directory is not going to be maintenance friendly either.

That's not a very hard problem to solve. Just write a script that splits the directory into 1000 file chunks, I did it with perl from scratch in a few hours.

Now my database dumps in seconds, and rsync takes care of new files just as fast.

--
Salutations, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Vincent Veyron
vv.lists@wanadoo.fr
In reply to: VENKTESH GUTTEDAR (#1)
Re: Storing Video's or vedio file in DB.

On Wed, 17 Dec 2014 16:51:10 +0530
VENKTESH GUTTEDAR <venkteshguttedar@gmail.com> wrote:

Hi Venktesh,

[you should not reply directly to me; hit reply all to post your messages to the list]

can you help me in achieving this, i mean how to store it in a file
system how to fetch it from there.

Nope. It's not postgresql related, it really depends on what tools you use to build your web site (in your case Django which I know nothing about)

and storing in a file system and giving the path will be secure.?

That's part of the configuration of your web server, you'll have to study their documentation/lists

--
Salutations, Vincent Veyron

https://libremen.com/
Legal case, contract and insurance claim management software

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Alan Hodgson
ahodgson@simkin.ca
In reply to: Thomas Kellerer (#4)
Re: Storing Video's or vedio file in DB.

On Wednesday, December 17, 2014 01:55:46 PM Thomas Kellerer wrote:

Albe Laurenz schrieb am 17.12.2014 um 11:07:

and the performance will be worse than reading files from the file system.

There is a Microsoft research [1] (from 2006) which tested this "myth" using
SQL Server. It showed that the database might actually be faster than the
file system.

As this topic comes up at my workplace every now and then as well, I created
a little web application (Java/JDBC) to test this on Postgres and possibly
other DBMS.

Turns out the Postgres as well isn't really slower at this than the file
system.

For small files around 50k both perform similar: the average time to read
the blob from a bytea column was around 2ms whereas the average time to
read the blob from the filesystem was around 1ms. The test uses 50 threads
to read the blobs using the PK of the table.

"Reading from the filesystem" means looking up the path for the file in the
database table and then reading the file from the filesystem.

For larger files around 250k Postgres was actually faster in my tests: 130ms
reading the bytea column vs. 260ms reading the file from disk.

The tests were done locally on my Windows laptop.
I didn't have time yet to do this on a Linux server. I expect the filesystem
to have some impact on the figures and NTFS is not known for being
blazingly fast. So maybe those figures will change.

My tests however do not take into account the actual time it takes to send
the binary data from the server to the client (=browser). It might well be
possible that serving the file through an Apache Web Server directly is
faster than serving the file through a JEE Servlet. My intention was to
measure the raw read speed of the binary data from the medium where it is
stored.

You can get the data from disk about as fast, but actually serving it results
in a large CPU hit that isn't present when serving files.

And if you're using bytea, your app server has to allocate memory to hold at
least one full copy of the file (I seem to recall that it works out to 2
copies, actually, but it's been a while since I tried it). Most languages
aren't good about releasing that memory, so that hit stays around until the
process gets recycled.

For a low volume app, both might be acceptable - any modern CPU can swamp most
outbound bandwidth even while decoding bytea. But it is a large amount of
overhead compared to a web server just dumping files into a network buffer
straight from disk cache.

Also, maintaining large tables still sucks. You can partition them to make
things friendlier. pg_upgrade makes things nicer, but it can't always be used,
so major version upgrades can still be a problem.

On the plus side, all your data is in one place, which makes it cluster-
friendly and easy to delete files when needed, and makes taking consistent
backups much simpler.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10John R Pierce
pierce@hogranch.com
In reply to: Thomas Kellerer (#4)
Re: Storing Video's or vedio file in DB.

On 12/17/2014 4:55 AM, Thomas Kellerer wrote:

Turns out the Postgres as well isn't really slower at this than the file system.

For small files around 50k both perform similar: the average time to read the blob from a bytea column was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test uses 50 threads to read the blobs using the PK of the table.

"Reading from the filesystem" means looking up the path for the file in the database table and then reading the file from the filesystem.

For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column vs. 260ms reading the file from disk.

with videos, we're likely looking at file sizes in the 100MB to
multi-gigabyte range, unless these are just short snippets. I'm not
very familiar with django and python, does it have facilities to stream
a very large record, or does it always transfer the whole thing as a
chunk in memory? Does it have PostgreSQL Large Object support?

Also, serving video via a webserver, this is generally done with a html5
or flash streaming server, where the web application generates the
embedded link to the video, but the video itself comes from said
streaming thing.... those streaming things are less likely to be able
to read a object out of postgres than they are to stream from the file
system.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Merlin Moncure
mmoncure@gmail.com
In reply to: VENKTESH GUTTEDAR (#1)
Re: Storing Video's or vedio file in DB.

On Wed, Dec 17, 2014 at 3:39 AM, VENKTESH GUTTEDAR
<venkteshguttedar@gmail.com> wrote:

Hello,

I am using PostGreSQL 9.3.5 and DJango1.7.5 and python 3.3.

I am working on a application where i will get video files from mobile
app, and i have to store in the server,
so my question is how do i store video's or video files in DB, or do i
need to store only the link of that video file, if yes then where i have to
store the file and how to access that file to display in an html page.

if anyone is helping me out then give me all details related to storing
videos in db, not just hints. like (what datatype i have to use.? how to
encode and decode.? how to accept files through http post)

Storing files in the db:
*) 1 gb limit (2 gb for large objects)
*) how you read and write the data really matters. to do this
seriously, writing the client in C and using binary wire format is
suggested
*) you can manage binary transfers in context of transaction, which is
nice (no db/fs sync issues)
*) backups will become a real headache using standard pg_dump methods
*) database will do some compression for you. however be advised pg
compression is a cpu cruncher
*) memory consumption can be multiples of largest file size

Storing files in the fs
*) have to manage database records as pointers. synchronization issue suck
*) backups will still suck
*) performance will be somewhat faster depending on how exactly the data is read
*) memory consumption is minimal

Personally having done it both ways I personally don't recommend
storing files in the database if they are going to be very large which
I would estimate to around 10mb or so assuming you have lots of data
to store. There are just too many cases where pg will open up the
whole file in memory where a well written streaming interface against
the o/s will send it right off the disk. Also in my opinion to have
anything approximating good performance in pg with binary transfers we
are definitely talking a C client.

File systems have been aggressively enhanced over the years to try and
keep fragmentation low in the face of high write activity. The
database doesn't do this and if you are dealing with a large amount of
files in the face of concurrent updating and deleting I expect you
could start running into severe fragmentation issues over time.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Jonathan Vanasco
postgres@2xlp.com
In reply to: VENKTESH GUTTEDAR (#1)
Re: Storing Video's or vedio file in DB.

I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which is stored in the database) and then store the video file with a 3rd party (e.g. Amazon S3).

1. This is often cheaper. Videos take up a lot of disk space. Having to ensure 2-3 copies of a file as a failover is not fun.
2. It offloads work from internal servers. Why deal with connections that are serving a static file if you can avoid it?

In terms of FS vs DB (aside from the open vs streaming which was already brought up)

I think the big issue with storing large files in the database is the input/output connection.
Postgres has a specified number of max connections available, and each one has some overhead to operate. Meanwhile, a server like nginx can handle 10k connections easily, and with little or no overhead. While the speed is comparable to the OS, you end up using a resource from a limited database connection pool. And you run the risk of a slow/dropped client tying up the connection.
Why allocate a resource to these operations, when there are more lightweight alternatives that won't tie up a database connection ?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Arthur Silva
arthurprs@gmail.com
In reply to: Jonathan Vanasco (#12)
Re: Storing Video's or vedio file in DB.

This! I'm surprised it took so long to somebody suggest an object store.
On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com> wrote:

Show quoted text

I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which is
stored in the database) and then store the video file with a 3rd party
(e.g. Amazon S3).

1. This is often cheaper. Videos take up a lot of disk space. Having to
ensure 2-3 copies of a file as a failover is not fun.
2. It offloads work from internal servers. Why deal with connections that
are serving a static file if you can avoid it?

In terms of FS vs DB (aside from the open vs streaming which was already
brought up)

I think the big issue with storing large files in the database is the
input/output connection.
Postgres has a specified number of max connections available, and each one
has some overhead to operate. Meanwhile, a server like nginx can handle 10k
connections easily, and with little or no overhead. While the speed is
comparable to the OS, you end up using a resource from a limited database
connection pool. And you run the risk of a slow/dropped client tying up
the connection.
Why allocate a resource to these operations, when there are more
lightweight alternatives that won't tie up a database connection ?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Arthur Silva (#13)
Re: Storing Video's or vedio file in DB.

On 12/17/2014 07:37 PM, Arthur Silva wrote:

This! I'm surprised it took so long to somebody suggest an object store.

I thought they did, a file system:)

On Dec 17, 2014 9:22 PM, "Jonathan Vanasco" <postgres@2xlp.com
<mailto:postgres@2xlp.com>> wrote:

I wouldn't even store it on the filesystem if I could avoid that.
Most people I know will assign the video a unique identifier (which
is stored in the database) and then store the video file with a 3rd
party (e.g. Amazon S3).

1. This is often cheaper. Videos take up a lot of disk space.
Having to ensure 2-3 copies of a file as a failover is not fun.
2. It offloads work from internal servers. Why deal with
connections that are serving a static file if you can avoid it?

In terms of FS vs DB (aside from the open vs streaming which was
already brought up)

I think the big issue with storing large files in the database is
the input/output connection.
Postgres has a specified number of max connections available, and
each one has some overhead to operate. Meanwhile, a server like
nginx can handle 10k connections easily, and with little or no
overhead. While the speed is comparable to the OS, you end up using
a resource from a limited database connection pool. And you run the
risk of a slow/dropped client tying up the connection.
Why allocate a resource to these operations, when there are more
lightweight alternatives that won't tie up a database connection ?

--

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general