database files

Started by Gail Zachariasover 22 years ago9 messagesgeneral
Jump to latest
#1Gail Zacharias
gz@clozure.com

I am investigating the possibility of using pgsql as the database in an application. I have some unusual requirements that I'd like to ask you all about. I apologize in advance if my terminology is a little "off", I'm not familiar with pgsql (yet).

I need to be able to move the database files, as normal user-visible files, between machines. I.e. given a database on machine A, I want to be able to copy either a single file (ideally) or a single directory (less ideal but still ok) to, say, a zip drive, bring it over to another machine (with pgsql also installed), start up my application and have it access the copied database through pgsql.

Is this sort of thing possible? Is a database stored in a single file or multiple files? Can the location of the file(s) be controlled? Are the files accessible and consistent while pgsql is running? I assume not all the time, but is there a reliable way to make them accessible (i.e. copyable) and consistent short of shutting down pgsql?

Is the file format of the pgsql database files compatible between OS's? E.g. could I take some database files from Linux and use them on Windows?

Thanks in advance for any advice,

Gail Zacharias

#2Dann Corbit
DCorbit@connx.com
In reply to: Gail Zacharias (#1)
Re: database files

-----Original Message-----
From: Gail Zacharias [mailto:gz@clozure.com]
Sent: Wednesday, October 22, 2003 12:42 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] database files

I am investigating the possibility of using pgsql as the
database in an application. I have some unusual requirements
that I'd like to ask you all about. I apologize in advance
if my terminology is a little "off", I'm not familiar with
pgsql (yet).

I need to be able to move the database files, as normal
user-visible files, between machines. I.e. given a database
on machine A, I want to be able to copy either a single file
(ideally) or a single directory (less ideal but still ok) to,
say, a zip drive, bring it over to another machine (with
pgsql also installed), start up my application and have it
access the copied database through pgsql.

Is this sort of thing possible? Is a database stored in a
single file or multiple files? Can the location of the
file(s) be controlled? Are the files accessible and
consistent while pgsql is running? I assume not all the time,
but is there a reliable way to make them accessible (i.e.
copyable) and consistent short of shutting down pgsql?

Is the file format of the pgsql database files compatible
between OS's? E.g. could I take some database files from
Linux and use them on Windows?

The generic way to accomplish what you want is with the COPY command.
http://developer.postgresql.org/docs/postgres/sql-copy.html

#3Chris Browne
cbbrowne@acm.org
In reply to: Gail Zacharias (#1)
Re: database files

gz@clozure.com (Gail Zacharias) wrote:

I am investigating the possibility of using pgsql as the database in
an application. I have some unusual requirements that I'd like to
ask you all about. I apologize in advance if my terminology is a
little "off", I'm not familiar with pgsql (yet).

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want
to be able to copy either a single file (ideally) or a single
directory (less ideal but still ok) to, say, a zip drive, bring it
over to another machine (with pgsql also installed), start up my
application and have it access the copied database through pgsql.

Is this sort of thing possible?

Many things are possible. Not all are sensible. The approach you
seem to want to take appears to fit into the "not sensible" category.

Is a database stored in a single file or multiple files?

Lots of files.

Can the location of the file(s) be controlled?

Yes, to a degree, either by fancy footwork when the database is shut
down, or, in the case of specific data files, via how you create them.

Are the files accessible and consistent while pgsql is running?

Only if you have some sort of logical volume manager around that can
copy a whole filesystem around atomically.

I assume not all the time, but is there a reliable way to make them
accessible (i.e. copyable) and consistent short of shutting down
pgsql?

If you are storing all of the data atop some logical volume manager
system such as Veritas or Tru64 "AdvFs" or Linux LVM, then there is
probably a way, but I'm not sure there is any equivalent on Windows,
so it seems unlikely that this could be practical.

Is the file format of the pgsql database files compatible between
OS's? E.g. could I take some database files from Linux and use them
on Windows?

Not generally, no.

It sounds as though the things you are trying to do are more or less
the exact opposite of what is generally considered "reasonable usage."

If you're at clozure, you're doubtless aware of the notion of taking
different approaches with different languages. Good Common Lisp code
isn't written the same way as colloquial Scheme which doesn't look at
all like colloquial C++ or Java.

There is an approach to doing this that _would_ provide consistent
copies, dumped into one file, of all of the data, that could indeed be
loaded onto another system without need to shut the database down.

Look at the documentation for pg_dump; that does what you _actually
want_, albeit not in the way you are asking to do it.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','ntlug.org').
http://www.ntlug.org/~cbbrowne/spiritual.html
"... the most important thing in the programming language is the name. A
language will not succeed without a good name. I have recently
invented a very good name and now I am looking for a suitable
language." -- D. E. Knuth, 1967

#4Noname
jearl@bullysports.com
In reply to: Gail Zacharias (#1)
Re: database files

Gail Zacharias <gz@clozure.com> writes:

I am investigating the possibility of using pgsql as the database in
an application. I have some unusual requirements that I'd like to
ask you all about. I apologize in advance if my terminology is a
little "off", I'm not familiar with pgsql (yet).

I think your terminology is fine.

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want
to be able to copy either a single file (ideally) or a single
directory (less ideal but still ok) to, say, a zip drive, bring it
over to another machine (with pgsql also installed), start up my
application and have it access the copied database through pgsql.

The way to do this with PostgreSQL is to make a backup of the database
and then load it into the other machine. For example on the master
database you would do:

pg_dumpall --clean --verbose > backup.sql

You would then put that backup.sql file on your zip disk or whatever
and carry it to your new machine where you would do something like:

psql -U postgres template1 -f backup.sql

Is this sort of thing possible? Is a database stored in a single
file or multiple files? Can the location of the file(s) be
controlled? Are the files accessible and consistent while pgsql is
running? I assume not all the time, but is there a reliable way to
make them accessible (i.e. copyable) and consistent short of
shutting down pgsql?

Databases are stored in multiple files in a directory plus the log
files and whatnot are stored in another part of the directory
structure. It is theoretically possible to shutdown your postmaster
and then copy the files to another location, but I wouldn't recommend
it. pg_dumpall works well, and it is far more fullproof.

Is the file format of the pgsql database files compatible between
OS's? E.g. could I take some database files from Linux and use them
on Windows?

I don't know the answer to that, but I would be interested in finding
out. My theory is that file formats and other arcana are far better
left to Tom Lane and the rest of the PostgreSQL hackers. This is
especially true considering the fact that on disk formats change
between versions.

Did I mention that pg_dumpall will solve your problem handily?

Thanks in advance for any advice,

Jason

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#4)
Re: database files

jearl@bullysports.com writes:

Gail Zacharias <gz@clozure.com> writes:

Is the file format of the pgsql database files compatible between
OS's? E.g. could I take some database files from Linux and use them
on Windows?

I don't know the answer to that, but I would be interested in finding
out.

I don't think there are any OS dependencies per se, but there are
certainly hardware dependencies (forget moving between Intel and PPC
without a dump/reload, for example). And there are compiler
dependencies, so theoretically you could run into problems even for
two different systems on the same hardware platform.

I concur with the recommendation to use pg_dump scripts as the vehicle
for moving data.

regards, tom lane

#6Paul Thomas
paul@tmsl.demon.co.uk
In reply to: Gail Zacharias (#1)
Re: database files

On 22/10/2003 20:41 Gail Zacharias wrote:

I am investigating the possibility of using pgsql as the database in an
application. I have some unusual requirements that I'd like to ask you
all about. I apologize in advance if my terminology is a little "off",
I'm not familiar with pgsql (yet).

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want to be
able to copy either a single file (ideally) or a single directory (less
ideal but still ok) to, say, a zip drive, bring it over to another
machine (with pgsql also installed), start up my application and have it
access the copied database through pgsql.

Is this sort of thing possible? Is a database stored in a single file or
multiple files? Can the location of the file(s) be controlled? Are the
files accessible and consistent while pgsql is running? I assume not all
the time, but is there a reliable way to make them accessible (i.e.
copyable) and consistent short of shutting down pgsql?

AFAIK, each database has its own directory and each table or index has its
own file but that won't help you much as they're given numeric names names
on disk (I thinks they use the objects OID) and I doubt that anyone
outside of the core developers would have the knowledge to find out which
files to copy, copy them and then manually edit the system catalogs on the
target machine so that the data can be read. Plus of course, on a *nix
machine, the data and directories are accessible only to the postgres user!

The correct way to do this is using pg_dump which can dump either a whole
database or a single file and then restore onto the other machine using
either psql or pg_restore (which one you use depends on the options you
supply to pg_dump). pg_dump runs inside a transaction which ensures a
consistent view of the dumped data.

Is the file format of the pgsql database files compatible between OS's?
E.g. could I take some database files from Linux and use them on Windows?

The only issue there _might_ be is the newline character if you dump to
ascii files but you can simply run unix2dos on the dump file in that case.

-- 
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants         | 
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+
#7Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Paul Thomas (#6)
Re: database files

On 22/10/2003 20:41 Gail Zacharias wrote:

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want to be
able to copy either a single file (ideally) or a single directory (less
ideal but still ok) to, say, a zip drive, bring it over to another
machine (with pgsql also installed), start up my application and have it
access the copied database through pgsql.

Is this sort of thing possible? Is a database stored in a single file or
multiple files? Can the location of the file(s) be controlled? Are the
files accessible and consistent while pgsql is running? I assume not all
the time, but is there a reliable way to make them accessible (i.e.
copyable) and consistent short of shutting down pgsql?

Yes, multiple, yes up to a point, no, no.

A possible mechanism would be:
- suppose you want to copy data from server A to server B
- server A is running
- server B is stopped
- checkpoint server A
- rsync the files from server A to server B
- stop the postmaster at A
- rsync again (should not take much time)
- start both postmasters

Note that between both rsyncs the data in server B is not usable (i.e.
it is corrupt). You _have_ to do the last rsync with A's postmaster
stopped to make sure the files are right.

Note that you have to copy the whole PGDATA, including pg_clog and
pg_xlog. This means server B cannot have anything beyond what is on
server A.

You should probably discard the pg_dump route and erServer before trying
to do this ...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Major Fambrough: You wish to see the frontier?
John Dunbar: Yes sir, before it's gone.

#8scott.marlowe
scott.marlowe@ihs.com
In reply to: Gail Zacharias (#1)
Re: database files

On Wed, 22 Oct 2003, Gail Zacharias wrote:

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want to
be able to copy either a single file (ideally) or a single directory
(less ideal but still ok) to, say, a zip drive, bring it over to another
machine (with pgsql also installed), start up my application and have it
access the copied database through pgsql.

While you think this is the preferred method, for postgresql is most
certainly is not. what you need to do is read up a bit on pg_dump and how
to use it to accomplish your goals. For instance, suppose I have two
machines, A and B, and I want to copy the table accounts from the test
database on A to B. Assuming that the test database exists, but the table
accounts doesn't, I can do this (Note these are all command line
programs, not psql):

pg_dump -h A test -t accounts |psql -h B test

Or, if I want to move a whole single database over:

createdb -h B dbname
pg_dump -h A dbname |psql -h B dbname

(This assumes the database dbname didn't exist.)

or, the biggie, assuming B is a freshly initdb'd database, and I want to
move ALL the databases from A to B:

pg_dumpall -h A|psql -h B

Moving individual database files around is a certifiably Bad idea.

#9Ken Godee
ken@perfect-image.com
In reply to: Paul Thomas (#6)
Re: database files

On 22/10/2003 20:41 Gail Zacharias wrote:

I am investigating the possibility of using pgsql as the database in an
application. I have some unusual requirements that I'd like to ask you
all about. I apologize in advance if my terminology is a little "off",
I'm not familiar with pgsql (yet).

I need to be able to move the database files, as normal user-visible
files, between machines. I.e. given a database on machine A, I want to be
able to copy either a single file (ideally) or a single directory (less
ideal but still ok) to, say, a zip drive, bring it over to another
machine (with pgsql also installed), start up my application and have it
access the copied database through pgsql.

Is this sort of thing possible? Is a database stored in a single file or
multiple files? Can the location of the file(s) be controlled? Are the
files accessible and consistent while pgsql is running? I assume not all
the time, but is there a reliable way to make them accessible (i.e.
copyable) and consistent short of shutting down pgsql?

Couldn't one ...
export PGDATA2 = /usr/local/database
create the original database in the PGDATA2
and then when you want to copy it, stop postgres,
recursively copy the database directory, start postgres.

On new machine...
export PGDATA2 = /usr/local/database
copy original database to new machine
start postrges

Would not postgres be able to connect to this
copied database on the new machine?