How to store text files in the postgresql?

Started by DimitryASuplatovalmost 17 years ago22 messagesgeneral
Jump to latest
#1DimitryASuplatov
genesup@gmail.com

Hello,

I am very new to postgresql database. I`ve used a little of MySql
previously.

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

That means that I need two functions. First - grab file from the
directory, store it in the database and delete from the disk; second -
recreate in back to the disk.

1/ Is it possible?
2/ Could you give me some quick tips on how to manage it from the start
so that I knew what to look for in the manual?

Thank you for your time.
Dimitry

#2Leif B. Kristensen
leif@solumslekt.org
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

On Saturday 6. June 2009, DimitryASuplatov wrote:

Hello,

I am very new to postgresql database. I`ve used a little of MySql
previously.

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

That means that I need two functions. First - grab file from the
directory, store it in the database and delete from the disk; second -
recreate in back to the disk.

1/ Is it possible?
2/ Could you give me some quick tips on how to manage it from the
start so that I knew what to look for in the manual?

Thank you for your time.
Dimitry

About 15 lines of Perl code should do that.
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/

In reply to: Leif B. Kristensen (#2)
Re: How to store text files in the postgresql?

On 06/06/2009 11:41, DimitryASuplatov wrote:

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

That means that I need two functions. First - grab file from the
directory, store it in the database and delete from the disk; second -
recreate in back to the disk.

1/ Is it possible?

Yes. You can do this in your favourite scripting language - PHP, Perl,
etc etc.

2/ Could you give me some quick tips on how to manage it from the start
so that I knew what to look for in the manual?

You haven't given much detail on your needs, but something like this
should do you:

create table the_files (
file_id serial,
file_data text,
...[any other metatdata you need]...
constraint files_pk primary key (file_id)
);

The "serial" type is a pseudo-type which creates a sequence, which in
turn supplies incrementing integers, while "text" allows you to store
strings of arbitrary length. - Look them up in the docs under "Data types".

Really, though, I think your question is more about how to write the
script which reads the files from the filesystem and restores them there
- this depends on what language you use.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

#4DimitryASuplatov
genesup@gmail.com
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

Thank you for answering.

I`ve read a manual and now I have a more advanced question.

1/ I`ve created a table in the database

mypdb=# create table pdb(
index int,
filename text,
filecontent text
);

2/ Then I want to read a file into it

First I insert metainfo

mypdb=# insert into pdb (index , filename) values (1,
'/home/sda/Documents/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb');
INSERT 0 1

But then comes the problem because the only command I found to read in
the file content is COPY but the following command would not work

mypdb=# copy pdb (filecontent) from
'/home/sda/Documents/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb'
where index=1;
ERROR: syntax error at or near "where"
LINE 1: ...ts/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb' where
inde...

QUESTION: what is the command to read the content of a plain text file
into a SPECIFIED table entry?

Thank you for your time.
SDA

In reply to: DimitryASuplatov (#4)
Re: How to store text files in the postgresql?

On 06/06/2009 14:37, DimitryASuplatov wrote:

But then comes the problem because the only command I found to read in
the file content is COPY but the following command would not work

Yes, this wont work here - COPY is intended for reading an entire table
to or from a disk file, not a single column. It's used for bulk
loading/exporting of data - pg_dump uses it, for example, when backing
up a database.

BTW, "index" is a reserved word, so you shouldn't use it as a column
name. If you *really* have to use it, you'll need to double-quote it
every time, which is a PITA.

mypdb=# copy pdb (filecontent) from
'/home/sda/Documents/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb'
where index=1;
ERROR: syntax error at or near "where"
LINE 1: ...ts/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb' where
inde...

QUESTION: what is the command to read the content of a plain text file
into a SPECIFIED table entry?

I don't think you can, directly in the database. Instead, write a short
script which reads the file contents into a string and then issues an
SQL query to INSERT a new row or UPDATE an existing one. In PHP, for
example (untested):

// Get the file contents.
$filename = 'my_file';
$h = fopen($filename, 'r');
$lines = fread($h, filesize($filename));
fclose($h);

// Write the contents to the database.
$sql = 'insert into pdb("index", filename, filecontents) values ($1, $2,
$3)';
$conn = pg_connect([..your connection string..]);
pg_query_params($sql, array(1, $filename, $lines));

HTH,

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

#6DimitryASuplatov
genesup@gmail.com
In reply to: Raymond O'Donnell (#5)
Re: How to store text files in the postgresql?

Thank you very much.

I`ve also worked out how to do this simply from bash

./bin/psql mypdb <<EOF
insert into pdb values ('`cat /file/name`');
EOF

SDA

Show quoted text

On Sat, 2009-06-06 at 16:32 +0100, Raymond O'Donnell wrote:

On 06/06/2009 14:37, DimitryASuplatov wrote:

But then comes the problem because the only command I found to read in
the file content is COPY but the following command would not work

Yes, this wont work here - COPY is intended for reading an entire table
to or from a disk file, not a single column. It's used for bulk
loading/exporting of data - pg_dump uses it, for example, when backing
up a database.

BTW, "index" is a reserved word, so you shouldn't use it as a column
name. If you *really* have to use it, you'll need to double-quote it
every time, which is a PITA.

mypdb=# copy pdb (filecontent) from
'/home/sda/Documents/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb'
where index=1;
ERROR: syntax error at or near "where"
LINE 1: ...ts/Work/PVA_India/PGA_test_modeling/pdb/1gm9.pdb' where
inde...

QUESTION: what is the command to read the content of a plain text file
into a SPECIFIED table entry?

I don't think you can, directly in the database. Instead, write a short
script which reads the file contents into a string and then issues an
SQL query to INSERT a new row or UPDATE an existing one. In PHP, for
example (untested):

// Get the file contents.
$filename = 'my_file';
$h = fopen($filename, 'r');
$lines = fread($h, filesize($filename));
fclose($h);

// Write the contents to the database.
$sql = 'insert into pdb("index", filename, filecontents) values ($1, $2,
$3)';
$conn = pg_connect([..your connection string..]);
pg_query_params($sql, array(1, $filename, $lines));

HTH,

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

#7Florian Weimer
fw@deneb.enyo.de
In reply to: DimitryASuplatov (#6)
Re: How to store text files in the postgresql?

* DimitryASuplatov:

I`ve also worked out how to do this simply from bash

./bin/psql mypdb <<EOF
insert into pdb values ('`cat /file/name`');
EOF

This doesn't work if the file contains embedded "'" characters (and
backslashes and NULs are also problematic). You will also get errors
if the file encoding does not match the database encoding.

You probably should use a BYTEA column and a little Perl script which
uses bind_param to specify a type of PG_BYTEA for the parameter.

#8Johan Nel
johan555.nel555@xsinet555.co.za
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

1/ Is it possible?
2/ Could you give me some quick tips on how to manage it from the start
so that I knew what to look for in the manual?

Not sure how much you know about programming, but easiest will probably
be to have a small application. Here is some code in the Npgsql library
documentation that shows how to do it in C#:

using System;
using System.Data;
using Npgsql;
using System.IO;
public class t
{
public static void Main(String[] args)
{
NpgsqlConnection conn = new NpgsqlConnection(
"server=localhost;user id=npgsql_tests;password=npgsql_tests");
conn.Open();
FileStream fs = new FileStream(args[0], FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(new BufferedStream(fs));
Byte[] bytes = br.ReadBytes((Int32)fs.Length);
Console.WriteLine(fs.Length);
br.Close();
fs.Close();
NpgsqlCommand command = new NpgsqlCommand(
"insert into tableBytea(field_bytea) values(:bytesData)", conn);
NpgsqlParameter param = new NpgsqlParameter(
":bytesData", DbType.Binary);
param.Value = bytes;
command.Parameters.Add(param);
command.ExecuteNonQuery();
command = new NpgsqlCommand(
"select field_bytea from tableBytea " +
"where field_serial = (select max(select field_serial) from " +
"tableBytea);", conn);
Byte[] result = (Byte[])command.ExecuteScalar();
fs = new FileStream(args[0] + "database", FileMode.Create,
FileAccess.Write);
BinaryWriter bw = new BinaryWriter(new BufferedStream(fs));
bw.Write(result);
bw.Flush();
fs.Close();
bw.Close();
conn.Close();
}
}

HTH,

Johan Nel
Pretoria, South Africa.

#9Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

Hi,

Le 6 juin 09 à 12:41, DimitryASuplatov a écrit :

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

The following article deals specifically with files containing XML but
goes as far as explaining exactly what to change to have it more
general:

http://blog.rhodiumtoad.org.uk/2009/02/05/reading-xml-files-into-the-database/

Regards,
--
dim

#10Steve Crawford
scrawford@pinpointresearch.com
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

DimitryASuplatov wrote:

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

That means that I need two functions. First - grab file from the
directory, store it in the database and delete from the disk; second -
recreate in back to the disk....

2/ Could you give me some quick tips on how to manage it from the start
so that I knew what to look for in the manual?

Others have pointed out some of the scripting methods to use. Some other
things to consider:

1. Give some thought to character-sets. It's likely that UTF8 on the
backend will be fine but there exists the possibility that you will have
to set the appropriate encoding on the client-side depending on the
nature of the files you will be reading.

2. Similarly, think about language. I don't know if you need to do
searches based on the contents of your text-fields but you should be
aware that PostgreSQL has some very nice text-search functions. Take a
moment to scan those docs to see if there is anything that may be of
use: http://www.postgresql.org/docs/8.3/interactive/textsearch.html

3. Tell us more about the app. How does data flow in/out of the system?
Are records updated or static? Are they retained indefinitely or purged?
If purged, on what basis? Lets say, for example, that you are putting
the data in for archival purposes and the records should be purged
monthly after 5 years. In that case, you should consider including some
sort of time indicator in your data and partitioning your table into
month-sized tables. Partitioning uses PostgreSQL's inheritance feature
allowing you to have a single parent table with no data and multiple
child tables that contain chunks of your data so every month you could
create a new child table to take in the new files for the month and drop
the table containing the data you want to purge. Or you could have the
table partitioned by users, projects or whatever is appropriate for your
app. Read up on inheritance and partitioning at
http://www.postgresql.org/docs/8.3/interactive/ddl-inherit.html and
http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html. If
you are able to partition into static chunks and changing chunks, you
may be able to create a much more efficient backup configuration wherein
you only regularly backup the changing piece of data.

Cheers,
Steve

#11Yaroslav Tykhiy
yar@barnet.com.au
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

DimitryASuplatov wrote:

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

I cannot but ask the community a related question here: Can such
design, that is, storing quite large objects of varying size in a
PostgreSQL database, be a good idea in the first place? I used to
believe that what RDBMS were really good at was storing a huge number
of relations, each of a small and mostly uniform size if expressed in
bytes; but today people tend to put big things, e.g., email or files,
in relational databases because it's convenient to them. That's
absolutely normal as typical data objects we have to deal with keep
growing in size, but how well can databases stand the pressure? And
can't it still be better to store large things as plain files and put
just their names in the database? File systems were designed for such
kind of job after all, unlike RDBMS.

Thanks!

Yar

#12Andy Colson
andy@squeakycode.net
In reply to: Yaroslav Tykhiy (#11)
Re: How to store text files in the postgresql?

Yaroslav Tykhiy wrote:

DimitryASuplatov wrote:

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

I cannot but ask the community a related question here: Can such design,
that is, storing quite large objects of varying size in a PostgreSQL
database, be a good idea in the first place? I used to believe that
what RDBMS were really good at was storing a huge number of relations,
each of a small and mostly uniform size if expressed in bytes; but today
people tend to put big things, e.g., email or files, in relational
databases because it's convenient to them. That's absolutely normal as
typical data objects we have to deal with keep growing in size, but how
well can databases stand the pressure? And can't it still be better to
store large things as plain files and put just their names in the
database? File systems were designed for such kind of job after all,
unlike RDBMS.

Thanks!

Yar

I'd have to vote yes, its a good idea. It offers consistency that you
cant get with the file system. We store assessor information in a
database along with a photo of the house. If the photos were on the
file system whats to prevent someone from blowing away the folder? (our
network admin loves to find big folders taking up lots of disk space and
blow them away)

Sure, you say, they can "delete from photos", but that's different (and
because our network admin does not know sql). If they delete a file,
I'll still have a photo record and know they used to have a photo, and I
know the path to the photo, but there is no photo there. But with the
photo in the database, if there is no photo, there is no record either.

If you are looking for speed, yea, I'd say store it on the file system.
But we are looking for convenience and consistency.

-Andy

#13Emanuel Calvo Franco
postgres.arg@gmail.com
In reply to: DimitryASuplatov (#1)
Re: How to store text files in the postgresql?

2009/6/6 DimitryASuplatov <genesup@gmail.com>:

Hello,

I am very new to postgresql database. I`ve used a little of MySql
previously.

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

That means that I need two functions. First - grab file from the
directory, store it in the database and delete from the disk; second -
recreate in back to the disk.

1/ Is it possible?
2/ Could you give me some quick tips on how to manage it from the start
so that I knew what to look for in the manual?

Thank you for your time.
Dimitry

You can use the function pg_read_file(text,bigint,bigint)?

like:
inser into table foo values(1,2,3 [...], pg_read_file(filename,1,[?]);

The problem is to write it on the disc, there is not a function for that AFAIK.

--
Emanuel Calvo Franco
ArPUG [www.arpug.com.ar] / AOSUG Member
www.emanuelcalvofranco.com.ar

#14Scott Ribe
scott_ribe@killerbytes.com
In reply to: Andy Colson (#12)
Re: How to store text files in the postgresql?

If I had an admin roaming through my document server deleting document files
out from under my database, that's a problem I would solve very
quickly--with a completely non-technical "solution".

After all, what's to prevent such a person from deleting pgsql data files???

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#15Bruce Momjian
bruce@momjian.us
In reply to: Emanuel Calvo Franco (#13)
Re: How to store text files in the postgresql?

This is a recurring debate and there are pros and cons for both sides.
It usually comes down to whether you need transactional guarantees for
these large objects.

There are also practical concerns. Transfering these large objects
over a single database tcp connection limits the application
performance a lot. And the database is often more heavyweight than you
really want to keep tied up to serve up images.

Also, it makes backups a pain since it's a lot easier to back up a
file system than a database. But that gets back to whether you need
transactional guarantees. The reason it's a pain to back up a database
is precisely because it needs to make those guarantees.

#16Andy Colson
andy@squeakycode.net
In reply to: Scott Ribe (#14)
Re: How to store text files in the postgresql?

Scott Ribe wrote:

If I had an admin roaming through my document server deleting document files
out from under my database, that's a problem I would solve very
quickly--with a completely non-technical "solution".

After all, what's to prevent such a person from deleting pgsql data files???

Yea, true. I don't actually have a rogue admin cleaning up too much.
My point was its "less accessible" and "feels safer" when the photos are
in the database.

-Andy

#17Alan Hodgson
ahodgson@simkin.ca
In reply to: Bruce Momjian (#15)
Re: How to store text files in the postgresql?

On Friday 12 June 2009, Greg Stark <gsstark@mit.edu> wrote:

Also, it makes backups a pain since it's a lot easier to back up a
file system than a database. But that gets back to whether you need
transactional guarantees. The reason it's a pain to back up a database
is precisely because it needs to make those guarantees.

It's far easier to backup and restore a database than millions of small
files. Small files = random disk I/O. The real downside is the CPU time
involved in storing and retrieving the files. If it isn't a show stopper,
then putting them in the database makes all kinds of sense.

--
WARNING: Do not look into laser with remaining eye.

#18Scott Ribe
scott_ribe@killerbytes.com
In reply to: Alan Hodgson (#17)
Re: How to store text files in the postgresql?

It's far easier to backup and restore a database than millions of small
files. Small files = random disk I/O. The real downside is the CPU time
involved in storing and retrieving the files. If it isn't a show stopper,
then putting them in the database makes all kinds of sense.

On the contrary, I think backup is one of the primary reasons to move files
*out* of the database. Decent incremental backup software greatly reduces
the I/O & time needed for backup of files as compared to a pg dump. (Of
course this assumes the managed files are long-lived.)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#19Alan Hodgson
ahodgson@simkin.ca
In reply to: Scott Ribe (#18)
Re: How to store text files in the postgresql?

On Friday 12 June 2009, Scott Ribe <scott_ribe@killerbytes.com> wrote:

It's far easier to backup and restore a database than millions of small
files. Small files = random disk I/O. The real downside is the CPU time
involved in storing and retrieving the files. If it isn't a show
stopper, then putting them in the database makes all kinds of sense.

On the contrary, I think backup is one of the primary reasons to move
files *out* of the database. Decent incremental backup software greatly
reduces the I/O & time needed for backup of files as compared to a pg
dump. (Of course this assumes the managed files are long-lived.)

We'll have to just disagree on that. You still have to do level 0 backups
occasionally. Scanning a directory tree of millions of files to decide what
to backup for an incremental can take forever. And restoring millions of
small files can take days.

But I concede there are good arguments for the filesystem approach;
certainly it's not a one size fits all problem. If your files are mostly
bigger than a few MB each, then the filesystem approach is probably better.

And of course big database tables get unwieldy too, for indexing and
vacuuming - I wouldn't necessarily put most files into the large object
interface, just the ones too big to want to fetch all in one piece.

--
WARNING: Do not look into laser with remaining eye.

#20Craig Ringer
craig@2ndquadrant.com
In reply to: Yaroslav Tykhiy (#11)
Re: How to store text files in the postgresql?

On Fri, 2009-06-12 at 19:53 +1000, Yaroslav Tykhiy wrote:

DimitryASuplatov wrote:

My task is to store a lot (10^5) of small ( <10 MB) text files in the
database with the ability to restore them back to the hard drive on
demand.

I cannot but ask the community a related question here: Can such
design, that is, storing quite large objects of varying size in a
PostgreSQL database, be a good idea in the first place? I used to
believe that what RDBMS were really good at was storing a huge number
of relations, each of a small and mostly uniform size if expressed in
bytes; but today people tend to put big things, e.g., email or files,
in relational databases because it's convenient to them. That's
absolutely normal as typical data objects we have to deal with keep
growing in size, but how well can databases stand the pressure? And
can't it still be better to store large things as plain files and put
just their names in the database? File systems were designed for such
kind of job after all, unlike RDBMS.

It depends a great deal on what you need.

Using the DB allows you to access that data using the same tools,
methods, connections, and security credentials you use for your other
data. It also allows you to manage it in the same transactional
environment, and verify its consistency.

Using the file system can be faster and offers a wider variety of
methods for accessing and manipulating the data. It can be easier to
back up efficiently (differential/incremental backups etc) may take up
less space, and more.

To me, a nearly ideal option would be a file system that supported
transactional operations and two phase commit. You could work with your
data objects normally in the FS (at least for read access), but you
could _ALSO_ modify them in tandem with the DB:

- Begin DB trans
- Begin FS trans
... do your work...
- prepare db trans for commit
- prepare fs trans for commit
- commit db trans
- commit fs trans

A transaction manager could hide those from you, or the DB could take
care of the FS 2PC as part of its own LOB and transaction management. In
fact, I think that's how Microsoft have done it with transactional NTFS
integration in MS SQL Server, which I must say sounds awfully nice.

--
Craig Ringer

#21Craig Ringer
craig@2ndquadrant.com
In reply to: Alan Hodgson (#19)
#22Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Yaroslav Tykhiy (#11)