Proposal: More flexible backup/restore via pg_dump
I am considering making some changes to pg_dump, and would appreciate any
suggestions/ideas from the list.
The outline is as follows:
- Add a flag to modify the output of pg_dump so that it is more amenable to
retrieving parts of the extracted data. This may involve writing the data
to a tar file/dbm file/custom-format file, rather than just a simple SQL
script file (see below).
- Add flags to allow selective import of information stored in the custom
dump file: eg. load the schema (no data), load only one table, define all
indexes or triggers for a given table etc. This would eventually allow for
overriding of tablespace settings.
- Add options to dump selected information in a readble format (ie.
probably SQL).
The broad approach would be modify the existing pg_dump as little as
possible; I am inclined to write the data as SQL (as currently done), and
append an 'index' to the output, specifying the offset on the file that
each piece of extractable data can be found. The 'restore' option would
just go to the relevant section(s), and pipe the data to psql.
I am also considering the possibility of writing the data to separate files
in a tar archive, since this may be a lot cleaner in the long run, although
the text file with the index at the end has the advantage that the index
could be written as a series of SQL comments, effectively making the dump
file compatible with existing dump file formats.
Any comments/suggestions would be appreciated.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
The broad approach would be modify the existing pg_dump as little as
possible; I am inclined to write the data as SQL (as currently done), and
append an 'index' to the output, specifying the offset on the file that
each piece of extractable data can be found. The 'restore' option would
just go to the relevant section(s), and pipe the data to psql.
A problem I see with an index at file end is, that you will need to read the
file
twice, and that may be very undesireable if e.g the backup is on tape
or a compressed file.
I like your idea of uniquely formatted comments heading separate sections
of the dump file, if the "create table ..." is not already enough.
Andreas
Import Notes
Resolved by subject fallback
At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
A problem I see with an index at file end is, that you will need to read the
file twice, and that may be very undesireable if e.g the backup is on tape
or a compressed file.
The proposal has actually come a fairly long way after extensive
discussions with Tom Lane, and I have added the current plans at the end of
this message. The TOC-at-end problem is an issue that I am trying to deal
with; I am planning a 'custom' format that has the large parts (data dumps)
compressed, to avoid the need of compressing the entire file. This means
that you would not need to uncompress the entire file to get to the TOC, or
to restore just the schema. It also allows good random access to defns and
data. I'm also considering putting the dumped data at the end of the file,
but this has issues when you want to restore table data before defining
indexes, for example.
I must admit that I've been working on the assumption that people using
PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
theory) a restore can be loaded onto disk from tape before being used. I
know this is pretty evil, but it will cover 95% of users. For those people
with huge backups, they will have to suffer tapes that go backward and
forwards a bit. From the details below, you will see that this is unavoidable.
Sanity Check: does fseek work on tapes? If not, what is the correct way to
read a particular block/byte from a file on a tape?
-----------------------------------------------------------
Updated Proposal:
-------------------------
For the sake of argument, call the new utilities pg_backup and pg_restore.
pg_backup
---------
Dump schema [and data] in OID order (to try to make restores sequential,
for when tar/tape storage is used). Each dumped item has a TOC entry which
includes the OID and description, and for those items for which we know
some dependencies (functions for types & aggregates; types for tables;
superclasses for classes; - any more?), it will also dump the dependency OIDs.
Each object (table defn, table data, function defn, type defn etc) is
dumped to a separate file/thing in the output file. The TOC entries go into
a separate file/thing (probably only one file/thing for the whole TOC).
The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files), and a
dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
Make file should not be excluded in the IO design. This last goal *may* not
be achieved, but I don't see why it can't be at this stage. Hopefully
someone with appropriate skills & motivation can do a tar archive 8-}.
The result of a pg_backup should be a single file with metadata and
optional data, along with whatever dependency and extra data is available
pg_backup, or provided by the DBA.
pg_restore
----------
Reads a backup file and dumps SQL suitable for sending to psql.
Options will include:
- No Data (--no-data? -nd? -s?)
- No metadata (--no-schema? -ns? -d?)
- Specification of items to dump from an input file; this allows custom
ordering AND custom selection of multiple items. Basically, I will allow
the user to dump part of the TOC, edit it, and tell pg_restore to use the
edited partial TOC. (--item-list=<file>? -l=<file>?)
- Dump TOC (--toc-only? -c?)
[Wish List]
- Data For a single table (--table=<name>? -t=<name>)
- Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
- User definied dependencies. Allow the DB developer to specify once for
thier DB what the dependencies are, then use that files as a guide to the
restore process. (--deps=<file> -D=<file>)
pg_restore will use the same custom IO routines to allow IO to
tar/directory/custom files. In the first pass, I will do custom file IO.
If a user selects to restore the entire metadata, then it will be dumped
according to the defaul policy (OID order). If they select to specify the
items from an input file, then the file ordering is used.
-------
Typical backup procedure:
pg_backup mydb mydb.bkp
or *maybe*
pg_backup mydb > mydb.bkp
BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.
Typical restore procedure:
pg_restore mydb mydb.bkp | psql
A user will be able to extract only the schema (-s), only the data (-d), a
specific table (-t=name), or even edit the object order and selection via:
pg_restore --dump-toc mydb.bkp > mytoc.txt
vi mytoc.txt {ie. reorder TOC elements as per known dependency problems}
pg_restore --item-list=mytoc.txt mydb.bkp | psql
FWIW, I envisage the ---dump-toc output to look like:
ID; FUNCTION FRED(INT4)
ID; TYPE MY_TYPE
ID; TABLE MY_TABLE
ID; DATA MY_TABLE
ID; INDEX MY_TABLE_IX1
...etc.
so editing and reordering the dump plan should not be too onerous.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Sanity Check: does fseek work on tapes? If not, what is the
correct way to
read a particular block/byte from a file on a tape?
This is usually not possible, meaning that to read backwards you have to
rewind
to the beginning of tape, then seek to your position. Same is usually true
if you
use a pipe to a storage manager.
Andreas
Import Notes
Resolved by subject fallback
I must admit that I've been working on the assumption that people using
PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
theory) a restore can be loaded onto disk from tape before being
used.
Are you are also assuming that a backup fits in a single file,
i.e. that anyone with >2GB of backup has some sort of large file
support?
Sanity Check: does fseek work on tapes? If not, what is the correct way to
read a particular block/byte from a file on a tape?
As someone else answered: no. You can't portably assume random access
to tape blocks.
The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files)
You can use a standard format without there being a standard API.
Using either tar or cpio format as defined for POSIX would allow a lot
of us to understand your on-tape format with a very low burden on you
for documentation. (If you do go this route you might want to think
about cpio format; it is less restrictive about filename length than
tar.)
There is also plenty of code lying around for reading and writing tar
and cpio formats that you could steal^H^H^H^H^H reuse. The BSD pax
code should have a suitable license.
pg_restore will use the same custom IO routines to allow IO to
tar/directory/custom files. In the first pass, I will do custom file
IO.
Presumably you'd expect this file I/O to be through some standard API
that other backends would also use? I'd be interested to see this;
I've got code for an experimental libtar somewhere around here, so I
could offer comments at least.
BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.
It depends what fseek is whether it works on standard output or not.
If it's a pipe, no. If it's a file, yes. If it's a tape, no. If
it's a ...
Not using fseek() would be a win if you can see a way to do it.
Regards,
Giles
At 07:00 27/06/00 +1000, Giles Lean wrote:
Are you are also assuming that a backup fits in a single file,
i.e. that anyone with >2GB of backup has some sort of large file
support?
That's up to the format used to save the database; in the case of the
'custom' format, yes. But that is the size after compression. This is not
substantially different to pg_dump's behaviour, except that pg_dump can be
piped to a tape drive...
The objective of the API components are to (a) make it very easy to add new
metadata to dump (eg. tablespaces), and (b) make it easy to add new output
formats (eg. tar archives). Basically the metadata dumping side makes one
call to register the thing to be saved, passing an optional function
pointer to dump data (eg. table contents) - this *could* even be used to
implement dumping of BLOBs.
The 'archiver' format provider must have some basic IO routines:
Read/WriteBuf and Read/WriteByte and has a number of hook functions which
it can use to output the data. It needs to provide at least one function
that actually writes data somewhere. It also has to provide the associated
function to read the data.
As someone else answered: no. You can't portably assume random access
to tape blocks.
This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:
write the schema and TOC out at the start of the file/tape, then compressed
data with headers for each indicating which TOC item they correspond to.
This metadata can be loaded into /tmp, so fseek is possible. The actual
data restoration (assuming constraints are not defined [THIS IS A PROBLEM])
can be done by scanning the rest of the tape in it's own order since RI
will not be an issue. I think I'm happy with this.
But the problem is the constraints: AFAIK there is no 'ALTER TABLE ADD
CONSTRAINT...' so PK, FK, Not Null constraints have to be applied before
data load (*please* tell me I'm wrong). This also means that for large
databases, I should apply indexes to make PK/FK checks fast, but they will
slow data load.
Any ideas?
The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files)You can use a standard format without there being a standard API.
Being a relatively lazy person, I was hoping to leave that as an excercise
for the reader...
Using either tar or cpio format as defined for POSIX would allow a lot
of us to understand your on-tape format with a very low burden on you
for documentation. (If you do go this route you might want to think
about cpio format; it is less restrictive about filename length than
tar.)
Tom Lane was also very favorably disposed to tar format. As I said above,
the archive interfaces should be pretty amenable to adding tar support -
it's just I'd like to get a version working with custom and directory based
formats to ensure the flexibility is there. As I see it, the 'backup to
directory' format should be easy to use as a basis for the 'backup to tar'
code.
The problem I have with tar is that it does not support random access to
the associated data. For reordering large backups, or (ultimately) single
BLOB extraction, this is a performance problem.
If you have a tar spec (or suitably licenced code), please mail it to me,
and I'll be able to make more informed comments.
Presumably you'd expect this file I/O to be through some standard API
that other backends would also use? I'd be interested to see this;
I've got code for an experimental libtar somewhere around here, so I
could offer comments at least.
No problem: I should have a working version pretty soon. The API is
strictly purpose-built; it would be adaptable to a more general archibe
format, but as you say, tar is fine for most purposes.
BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.Not using fseek() would be a win if you can see a way to do it.
I think I probably can if I can work my way around RI problems.
Unfortunately the most effective solution will be to allow reording of the
table data restoration order, but that requires multiple passes through the
file to find the table data...
Bye for now,
Philip
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
If you have a tar spec (or suitably licenced code), please mail it to me,
and I'll be able to make more informed comments.
The POSIX tar format is documented as part of the "Single Unix
Specification, version 2" which is available from:
http://www.opengroup.org
http://www.opengroup.org/publications/catalog/t912.htm
You can download the standard as HTML. They keep moving the location
around so if the second URL breaks start from the top. They do want an
email address from you and they will spam this address with
invitations to conferences. There's no such thing as a free lunch, I
guess.
For source code, any FreeBSD, NetBSD, or OpenBSD mirror will have pax
which understands both cpio and tar format and is BSD licensed:
ftp://ftp.au.netbsd.org/pub/NetBSD/NetBSD-current/src/bin/pax/
Regards,
Giles
comments prefixed with PM...
--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council
-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Tuesday, June 27, 2000 10:07 AM
To: Giles Lean
Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal: More flexible backup/restore via
pg_dump
At 07:00 27/06/00 +1000, Giles Lean wrote:
Are you are also assuming that a backup fits in a single file,
i.e. that anyone with >2GB of backup has some sort of large file
support?
That's up to the format used to save the database; in the case of the
'custom' format, yes. But that is the size after compression. This is not
substantially different to pg_dump's behaviour, except that pg_dump can be
piped to a tape drive...
PM: So can most other Unix based formats. On the intranet server here, I
pg_dump into /tmp, then include them in a tar piped to the tape drive.
The objective of the API components are to (a) make it very easy to add new
metadata to dump (eg. tablespaces), and (b) make it easy to add new output
formats (eg. tar archives). Basically the metadata dumping side makes one
call to register the thing to be saved, passing an optional function
pointer to dump data (eg. table contents) - this *could* even be used to
implement dumping of BLOBs.
PM: The problem with blobs hasn't been with dumping them (I have some Java
code that does it into a compressed zip file), but restoring them - you
can't create a blob with a specific OID, so any references in existing
tables will break. I currently get round it by updating the tables after the
restore - but it's ugly and easy to break :(
The 'archiver' format provider must have some basic IO routines:
Read/WriteBuf and Read/WriteByte and has a number of hook functions which
it can use to output the data. It needs to provide at least one function
that actually writes data somewhere. It also has to provide the associated
function to read the data.
PM: Having a set of api's (either accessible directly into the backend,
and/or via some fastpath call) would be useful indeed.
As someone else answered: no. You can't portably assume random access
to tape blocks.
This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:
PM: That would be useful. I don't know about CPIO, but tar stores the TOC at
the start of each file (so you can actually join two tar files together and
still read all the files). In this way, you could put the table name as the
"filename" in the header, so partial restores could be done.
write the schema and TOC out at the start of the file/tape, then compressed
data with headers for each indicating which TOC item they correspond to.
This metadata can be loaded into /tmp, so fseek is possible. The actual
data restoration (assuming constraints are not defined [THIS IS A PROBLEM])
can be done by scanning the rest of the tape in it's own order since RI
will not be an issue. I think I'm happy with this.
PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
through the tape, so there must be a way of doing it.
[snip]
Using either tar or cpio format as defined for POSIX would allow a lot
of us to understand your on-tape format with a very low burden on you
for documentation. (If you do go this route you might want to think
about cpio format; it is less restrictive about filename length than
tar.)
Tom Lane was also very favorably disposed to tar format. As I said above,
the archive interfaces should be pretty amenable to adding tar support -
it's just I'd like to get a version working with custom and directory based
formats to ensure the flexibility is there. As I see it, the 'backup to
directory' format should be easy to use as a basis for the 'backup to tar'
code.
PM: I don't see a problem there. The Java stuff I have use the standard
java.util.zip package wich has a simple api for zip files. Tar or most other
formats could be implemented in a similar compatible fashion.
The problem I have with tar is that it does not support random access to
the associated data. For reordering large backups, or (ultimately) single
BLOB extraction, this is a performance problem.
PM: Tar can do this sequentially, which I've had to do many times over the
years - restoring just one file from a tape, sequential access is probably
the only way.
If you have a tar spec (or suitably licenced code), please mail it to me,
and I'll be able to make more informed comments.
PM: The tar spec should be around somewhere - just be careful, the obvious
source I was thinking of would be GPL'd, and we don't want to be poluted :-)
[snip]
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Import Notes
Resolved by subject fallback
At 10:55 27/06/00 +0100, Peter Mount wrote:
comments prefixed with PM...
PM: So can most other Unix based formats. On the intranet server here, I
pg_dump into /tmp, then include them in a tar piped to the tape drive.
Since you are using an intermediate file, there would be no change. fseek
is only an issue on tape drives and pipes, not files on disk. I suspect
that most people can afford the overhead of restoring the backup file to
disk before restoring the database, but it'd be nice to start out as
flexible as possible. In the back of my mind is the fact that when the WAL
and storage manager are going, raw data backups should be possible (and
fast) - but then again, maybe it's a pipe dream.
PM: The problem with blobs hasn't been with dumping them (I have some Java
code that does it into a compressed zip file), but restoring them - you
can't create a blob with a specific OID, so any references in existing
tables will break. I currently get round it by updating the tables after the
restore - but it's ugly and easy to break :(
I assumed this would have to happen - hence why it will not be in the first
version. With all the TOAST stuff coming, and talk about the storage
manager, I still live in hope of a better BLOB system...
PM: Having a set of api's (either accessible directly into the backend,
and/or via some fastpath call) would be useful indeed.
By this I assume you mean APIs to get to database data, not backup data.
This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:PM: That would be useful. I don't know about CPIO, but tar stores the TOC at
the start of each file (so you can actually join two tar files together and
still read all the files). In this way, you could put the table name as the
"filename" in the header, so partial restores could be done.
Well, the way my plans work, I'll use either a munged OID, or a arbitrary
unique ID as the file name. All meaningful access has to go via the TOC.
But that's just a detail; the basic idea is what I'm implementing.
It's very tempting to say tape restores are only possible in the order in
which the backup file was written ('pg_restore --stream' ?), and that
reordering/selection is only possible if you put the file on disk.
PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
through the tape, so there must be a way of doing it.
Maybe; I know BackupExec also does some kind of seek to update the TOC at
end of a backup (which is what I need to do). Then again, maybe that's just
a rewind. I don't want to get into custom tape formats...
Do we have any tape experts out there?
PM: Tar can do this sequentially, which I've had to do many times over the
years - restoring just one file from a tape, sequential access is probably
the only way.
It's just nasty when the user reorders the restoration of tables and
metadata. In the worst cast it might be hundreds of scans of the tape. I'd
hate to have my name associated with something so unfriendly (even if it is
the operators fault).
PM: The tar spec should be around somewhere - just be careful, the obvious
source I was thinking of would be GPL'd, and we don't want to be poluted :-)
That was my problem. I've got some references now, and I'll look at them.
At least everything I've written so far can be put in PG.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
--
Peter Mount
Enterprise Support
Maidstone Borough Council
Any views stated are my own, and not those of Maidstone Borough Council
-----Original Message-----
From: Philip Warner [mailto:pjw@rhyme.com.au]
Sent: Tuesday, June 27, 2000 11:31 AM
To: Peter Mount; Giles Lean
Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] Proposal: More flexible backup/restore via
pg_dump
At 10:55 27/06/00 +0100, Peter Mount wrote:
comments prefixed with PM...
PM: So can most other Unix based formats. On the intranet server here, I
pg_dump into /tmp, then include them in a tar piped to the tape drive.
Since you are using an intermediate file, there would be no change. fseek
is only an issue on tape drives and pipes, not files on disk. I suspect
that most people can afford the overhead of restoring the backup file to
disk before restoring the database, but it'd be nice to start out as
flexible as possible. In the back of my mind is the fact that when the WAL
and storage manager are going, raw data backups should be possible (and
fast) - but then again, maybe it's a pipe dream.
PM: Actually, with MS-SQL here, I use it's "Backup Device" (M$'s name for a
file ;-) ). It then has a scheduled job that backs up into that file (a full
backup empties the file first, then the incremental's append to it). Another
server then just backsup this single file.
PM: Anyhow, when I need to do a restore, I'm presented with a list of what
backup's are stored on that "device", and can restore from there.
PM: The problem with blobs hasn't been with dumping them (I have some Java
code that does it into a compressed zip file), but restoring them - you
can't create a blob with a specific OID, so any references in existing
tables will break. I currently get round it by updating the tables after
the
restore - but it's ugly and easy to break :(
I assumed this would have to happen - hence why it will not be in the first
version. With all the TOAST stuff coming, and talk about the storage
manager, I still live in hope of a better BLOB system...
PM: I do as well. I've got a project coming up soon where the database will
be storing thumbnails of photographs (with details of where the full images
are stored, what film/negative# the original is from (for 35mm), notes, etc.
Anyhow, the current BLOB system may not cope with it, and the kludges I've
used in the past probably won't help.
PM: Having a set of api's (either accessible directly into the backend,
and/or via some fastpath call) would be useful indeed.
By this I assume you mean APIs to get to database data, not backup data.
PM: I was thinking of the backup data (in a similar way you can with SQL7).
A user could (if they have permissions) trigger a backup or a restore.
eg, my log database here is backed up using:
BACKUP DATABASE eventlog TO eventdb WITH INIT, NAME=N'Eventlog
Backup', DESCRIPTION=N'The NT & Exchange Event Logs'
Here, eventlog is the database name, eventdb the "device", ie the file it
writes to, INIT erases the "device" and NAME/DESCRIPTION are written to the
device to help locate the backup when restoring.
Anyhow, this is possible future stuff ;-)
This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:PM: That would be useful. I don't know about CPIO, but tar stores the TOC
at
the start of each file (so you can actually join two tar files together and
still read all the files). In this way, you could put the table name as the
"filename" in the header, so partial restores could be done.
Well, the way my plans work, I'll use either a munged OID, or a arbitrary
unique ID as the file name. All meaningful access has to go via the TOC.
But that's just a detail; the basic idea is what I'm implementing.
It's very tempting to say tape restores are only possible in the order in
which the backup file was written ('pg_restore --stream' ?), and that
reordering/selection is only possible if you put the file on disk.
PM: It's an idea for those with large (>=2Gb) databases. Most filesystems
don't like an individual file to be larger (or equal) to 2Gb. You may want
to keep this in mind when writing to a file (ie, break at 1Gb as we do with
tables).
PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
through the tape, so there must be a way of doing it.
Maybe; I know BackupExec also does some kind of seek to update the TOC at
end of a backup (which is what I need to do). Then again, maybe that's just
a rewind. I don't want to get into custom tape formats...
PM: Some tape drives support a "mark", but I've not seen/used one. When
watching ArcServe, it seems to rewind to the beginning, then "seek" to the
begining of the backup. But once it's done that, it knows where it is, and
seeks forward from there.
Do we have any tape experts out there?
PM: Tar can do this sequentially, which I've had to do many times over the
years - restoring just one file from a tape, sequential access is probably
the only way.
It's just nasty when the user reorders the restoration of tables and
metadata. In the worst cast it might be hundreds of scans of the tape. I'd
hate to have my name associated with something so unfriendly (even if it is
the operators fault).
PM: That's true, but sometimes (and it happens here a lot), it's
unavoidable.
PM: The tar spec should be around somewhere - just be careful, the obvious
source I was thinking of would be GPL'd, and we don't want to be poluted
:-)
That was my problem. I've got some references now, and I'll look at them.
At least everything I've written so far can be put in PG.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Import Notes
Resolved by subject fallback
Do we have any tape experts out there?
Don't even try. The guaranteed portable subset of tape drive,
interface, device driver, and OS semantics is pretty limited.
I'm confident to stream one (tape) file of less than one tape capacity
to a drive and read it back sequentially. These days you can probably
expect reliable end of media handling as well, but don't be too sure
what value errno will have when you do hit the end of a tape.
As soon as you start looking to deal with more advanced facilities you
will discover portability problems:
- autochanger interface differences
- head positioning on tapes with multiple files (SysV v. BSD, anyone?)
- random access (yup, supported on some drives ... probably all obsolete)
- "fast search marks" and similar
Some of these things can vary on the one OS if a tape drive is
connected to different interfaces, since different drivers may be
used.
BTW I'm not a tape expert. The problems in practice may be greater or
lesser than I've suggested.
I would be trying really hard to work out a backup format that allows
a one pass restore. Rummaging around in the database when making the
backup and using some scratch space at that time isn't too bad. Using
scratch space upon restore is more problematic; restore problems are
traditionally found at the worst possible moment!
Regards,
Giles
Maybe; I know BackupExec also does some kind of seek to
update the TOC at
end of a backup (which is what I need to do). Then again,
maybe that's just
a rewind. I don't want to get into custom tape formats...Do we have any tape experts out there?
Dont lock yourself in on the tape issue, it is the pipes that
actually add value to the utility, and those can't rewind, seek
or whatever.
pipes can:
compress, split output, write to storage managers, stripe output,
.....
I guess we would want two formats, one for pipe, and one for a standard
directory.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
pipes can:
compress, split output, write to storage managers, stripe output,
.....
Right, the thing we *really* want is to preserve the fact that pg_dump
can write its output to a pipeline ... and that a restore can read from
one. If you can improve performance when you find you do have a
seekable source/destination file, fine, but the utilities must NOT
require it.
I guess we would want two formats, one for pipe, and one for a standard
directory.
At the risk of becoming tiresome, "tar" format is eminently pipeable...
regards, tom lane
At 10:48 27/06/00 -0400, Tom Lane wrote:
Right, the thing we *really* want is to preserve the fact that pg_dump
can write its output to a pipeline ... and that a restore can read from
one. If you can improve performance when you find you do have a
seekable source/destination file, fine, but the utilities must NOT
require it.
OK, the limitation will have to be that reordering of *data* loads (as
opposed to metadata) will not be possible in piped data. This is only a
problem if RI constraints are loaded.
I *could* dump the compressed data to /tmp, but I would guess that in most
cases when the archive file is being piped it's because the file won't fit
on a local disk.
Does this sound reasonable?
I guess we would want two formats, one for pipe, and one for a standard
directory.At the risk of becoming tiresome, "tar" format is eminently pipeable...
No, it's good...I'll never feel guilty about asking for optimizer hints again.
More seriously, though, if I pipe a tar file, I still can't reorder the
*data* files without saving them to disk, which is what I want to avoid.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
More seriously, though, if I pipe a tar file, I still can't reorder the
*data* files without saving them to disk, which is what I want to avoid.
True. This is not an issue on the dump side, of course, since you can
choose what order you're going to write the tables in. On the restore
side, you have no alternative but to restore the tables in the order
they appear on tape. Of course the DBA can run the restore utility
more than once and extract a subset of tables each time, but I don't
see how the restore utility can be expected to do that for him.
(Except if it finds it does have the ability to seek in its input file,
but I dunno if it's a good idea to use that case for anything except
under-the-hood performance improvement, ie quickly skipping over the
data you don't need. Features that don't work all the time are not
good in my book.)
Basically I think we want to assume that pg_dump will write the tables
in an order that's OK for restoring. If we can arrange for RI checks
not to be installed until after all the data is loaded, this shouldn't
be a big problem, seems like.
regards, tom lane
At 11:23 27/06/00 -0400, Tom Lane wrote:
Of course the DBA can run the restore utility
more than once and extract a subset of tables each time, but I don't
see how the restore utility can be expected to do that for him.
Only works with seek (ie. a file).
Features that don't work all the time are not
good in my book.)
The *only* bit that won't work is being able to select the table data load
order, and I can fix that by writing tables that are wanted later to /tmp
if seek is unavailable. This *may* not be a problem, and probably should be
presented as an option to the user if restoring from non-seekable media.
Assuming that the backup was originally written to seekable media, I will
be able to tell the user how much space will be required, which should help.
I don't suppose anyone knows of a way of telling if a file handle supports
seek?
Basically I think we want to assume that pg_dump will write the tables
in an order that's OK for restoring. If we can arrange for RI checks
not to be installed until after all the data is loaded, this shouldn't
be a big problem, seems like.
Part of the motivation for this utility was to allow DBAs to fix the
ordering at restore time, but otherwise I totally agree. Unfortunately I
don't think the RI checks can be delayed at this stage - can they?
I don't suppose there is a 'disable constraints' command? Or the ability to
set all constraints as deferrred until commit-time?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
On Tue, 27 Jun 2000, Philip Warner wrote:
But the problem is the constraints: AFAIK there is no 'ALTER TABLE ADD
CONSTRAINT...' so PK, FK, Not Null constraints have to be applied before
data load (*please* tell me I'm wrong). This also means that for large
databases, I should apply indexes to make PK/FK checks fast, but they will
slow data load.
Actually, there is an ALTER TABLE ADD CONSTRAINT for foreign key
constraints. Of course, if the existing data fails the constraint the
constraint doesn't get made. and if you're in a transaction, it'll force
a rollback.
In fact, you really can't always apply foreign key constraints at
schema reload time because you can have tables with circular
dependencies. Those would have to be created after data load.
I don't suppose anyone knows of a way of telling if a file handle supports
seek?
The traditional method is to call lseek() and see what happens.
Part of the motivation for this utility was to allow DBAs to fix the
ordering at restore time, but otherwise I totally agree. Unfortunately I
don't think the RI checks can be delayed at this stage - can they?
The current pg_dump handles the data and then adds the constraints.
Otherwise there are "chicken and egg" problems where two tables have
mutual RI constraints. Even at the tuple level two tuples can be
mutually dependent.
Regards,
Giles
At 13:10 27/06/00 -0700, Stephan Szabo wrote:
Actually, there is an ALTER TABLE ADD CONSTRAINT for foreign key
constraints.
This is good to know; presumably at some stage in the future the rest will
be added, and the backup/restore can be amended to apply constraints after
data load. In the mean time, I suppose people with tape drives who need to
reorder the data load will have to make multiple passes (or copy the file
locally).
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 08:48 28/06/00 +1000, Giles Lean wrote:
Part of the motivation for this utility was to allow DBAs to fix the
ordering at restore time, but otherwise I totally agree. Unfortunately I
don't think the RI checks can be delayed at this stage - can they?The current pg_dump handles the data and then adds the constraints.
Not as far as I can see; that's what I want to do, bu there is no
implemented syntax for doing it. pg_dump simply dumps the table definition
with constraints (at least on 7.0.2).
Otherwise there are "chicken and egg" problems where two tables have
mutual RI constraints. Even at the tuple level two tuples can be
mutually dependent.
Absolutely. And AFAICT, these happen with pg_dump.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 08:48 28/06/00 +1000, Giles Lean wrote:
Otherwise there are "chicken and egg" problems where two tables have
mutual RI constraints. Even at the tuple level two tuples can be
mutually dependent.Absolutely. And AFAICT, these happen with pg_dump.
This will happen for check constraints, but not for foreign key
constraints...
It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
after the data dump is finished. And, if you do separate schema and data
dumps, the wierd statements at the top and bottom of the data dump turn
off triggers and then turn them on again (in the most painful way possible).
However those cases do not actually guarantee the validity of the data in
between.
At 12:06 28/06/00 -0700, Stephan Szabo wrote:
This will happen for check constraints, but not for foreign key
constraints...
It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
after the data dump is finished. And, if you do separate schema and data
dumps, the wierd statements at the top and bottom of the data dump turn
off triggers and then turn them on again (in the most painful way possible).
Thanks for this information!
I had not seen those statements before; I have been mistakenly modifying
6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
way of also disabling all constraint checking while loading the data?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
I think my previous message went dead...
This will happen for check constraints, but not for foreign key
constraints...
It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
after the data dump is finished. And, if you do separate schema and data
dumps, the wierd statements at the top and bottom of the data dump turn
off triggers and then turn them on again (in the most painful way
possible).
Thanks for this information!
I had not seen those statements before; I have been mistakenly modifying
6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
way of also disabling all constraint checking while loading the data?
Well, for unique you could remove/recreate the unique index. NOT NULL is
probably
not worth bothering with. Check constraints might be able to be turned off
for a table
by setting relchecks to 0 in the pg_class row and the resetting it after
data is loaded (sort
of like what we do on data only dumps for triggers).
The problem is that the create constraint trigger, playing with reltriggers
and playing with
relchecks doesn't guarantee that the data being loaded is correct. And if
you remove
and recreate a unique index, you might not get the index back at the end,
and then
you've lost the information that there was supposed to be a unique or
primary key on
the table.
It might be a good idea to have some sort of pg_constraint (or whatever)
that holds
this data, since that would also make it easier to make the constraint
naming SQL compliant
(no duplicate constraint names within schema - that includes automatically
generated ones),
and it might help if we ever try to make deferrable check/primary key/etc...
"Stephan Szabo" <sszabo@kick.com> writes:
I had not seen those statements before; I have been mistakenly modifying
6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
way of also disabling all constraint checking while loading the data?
Well, for unique you could remove/recreate the unique index. NOT NULL
is probably not worth bothering with. Check constraints might be able
to be turned off for a table by setting relchecks to 0 in the pg_class
row and the resetting it after data is loaded (sort of like what we do
on data only dumps for triggers).
There's no need to disable NOT NULL, nor unique constraints either,
since those are purely local to a table --- if they're going to fail,
altering load order doesn't prevent it. The things you need to worry
about are constraint expressions that cause references to other tables
(perhaps indirectly via a function).
If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
solved, I believe. This should be a minor exercise --- the heavy
lifting is already done, because heap.c's AddRelationRawConstraints()
is already set up to be invokable on a pre-existing relation. Also
the parser knows how to parse ALTER TABLE ADD CONSTRAINT ... I think
all that's missing is a few lines of glue code in command.c.
regards, tom lane
There's no need to disable NOT NULL, nor unique constraints either,
since those are purely local to a table --- if they're going to fail,
altering load order doesn't prevent it. The things you need to worry
Is there a speed difference with doing a copy on a table with an index
versus creating
the index at the end? I've been assuming that the latter was faster (and
that that was
part of what he wanted)
about are constraint expressions that cause references to other tables
(perhaps indirectly via a function).
Yeah, that's actually a big problem, since that's actually also a constraint
on the other table
as well, and as far as I know, we aren't yet constraining the other table.
If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
solved, I believe. This should be a minor exercise --- the heavy
lifting is already done, because heap.c's AddRelationRawConstraints()
is already set up to be invokable on a pre-existing relation. Also
the parser knows how to parse ALTER TABLE ADD CONSTRAINT ... I think
all that's missing is a few lines of glue code in command.c.
Does the AddRelationRawConstraints() check that the constraint is satisified
as well when
you add it? It didn't look like it did, but I could be missing something.
That's another requirement of ALTER TABLE ADD CONSTRAINT. That was the
bit I wasn't sure how to do for other generic constraints when I added the
foreign key one.
"Stephan Szabo" <sszabo@kick.com> writes:
If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
solved, I believe. This should be a minor exercise --- the heavy
lifting is already done, because heap.c's AddRelationRawConstraints()
is already set up to be invokable on a pre-existing relation.
Does the AddRelationRawConstraints() check that the constraint is
satisified as well when you add it? It didn't look like it did, but I
could be missing something.
Oh, you're right, it does not. So you'd first have to run through the
table and verify that the constraint holds for each existing tuple.
Doesn't seem like a big deal though...
regards, tom lane
At 20:30 29/06/00 -0400, Tom Lane wrote:
Oh, you're right, it does not. So you'd first have to run through the
table and verify that the constraint holds for each existing tuple.
Doesn't seem like a big deal though...
Does this mean somebody is likely to do it? It'd certainly make
backup/restore more reliable.
I'm almost at the point of asking for testers with the revised
pg_dump/pg_restore, so I'll go with what I have for now, but it would make
life a lot less messy. Since the new version *allows* table restoration
intermixed with metadata, and in any order, I need to update pg_class
repeatedly (I assume there may be system triggers that need to execute when
metadata is changed).
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
At 20:30 29/06/00 -0400, Tom Lane wrote:
Oh, you're right, it does not. So you'd first have to run through the
table and verify that the constraint holds for each existing tuple.
Doesn't seem like a big deal though...Does this mean somebody is likely to do it? It'd certainly make
backup/restore more reliable.
I'll take a stab at it. It might take me a while to get stuff working
but it shouldn't take too long before the beginnings are there.
Can I ask on a status?
At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
A problem I see with an index at file end is, that you will need to read the
file twice, and that may be very undesireable if e.g the backup is on tape
or a compressed file.The proposal has actually come a fairly long way after extensive
discussions with Tom Lane, and I have added the current plans at the end of
this message. The TOC-at-end problem is an issue that I am trying to deal
with; I am planning a 'custom' format that has the large parts (data dumps)
compressed, to avoid the need of compressing the entire file. This means
that you would not need to uncompress the entire file to get to the TOC, or
to restore just the schema. It also allows good random access to defns and
data. I'm also considering putting the dumped data at the end of the file,
but this has issues when you want to restore table data before defining
indexes, for example.I must admit that I've been working on the assumption that people using
PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
theory) a restore can be loaded onto disk from tape before being used. I
know this is pretty evil, but it will cover 95% of users. For those people
with huge backups, they will have to suffer tapes that go backward and
forwards a bit. From the details below, you will see that this is unavoidable.Sanity Check: does fseek work on tapes? If not, what is the correct way to
read a particular block/byte from a file on a tape?-----------------------------------------------------------
Updated Proposal:
-------------------------For the sake of argument, call the new utilities pg_backup and pg_restore.
pg_backup
---------Dump schema [and data] in OID order (to try to make restores sequential,
for when tar/tape storage is used). Each dumped item has a TOC entry which
includes the OID and description, and for those items for which we know
some dependencies (functions for types & aggregates; types for tables;
superclasses for classes; - any more?), it will also dump the dependency OIDs.Each object (table defn, table data, function defn, type defn etc) is
dumped to a separate file/thing in the output file. The TOC entries go into
a separate file/thing (probably only one file/thing for the whole TOC).The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files), and a
dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
Make file should not be excluded in the IO design. This last goal *may* not
be achieved, but I don't see why it can't be at this stage. Hopefully
someone with appropriate skills & motivation can do a tar archive 8-}.The result of a pg_backup should be a single file with metadata and
optional data, along with whatever dependency and extra data is available
pg_backup, or provided by the DBA.pg_restore
----------Reads a backup file and dumps SQL suitable for sending to psql.
Options will include:
- No Data (--no-data? -nd? -s?)
- No metadata (--no-schema? -ns? -d?)
- Specification of items to dump from an input file; this allows custom
ordering AND custom selection of multiple items. Basically, I will allow
the user to dump part of the TOC, edit it, and tell pg_restore to use the
edited partial TOC. (--item-list=<file>? -l=<file>?)
- Dump TOC (--toc-only? -c?)[Wish List]
- Data For a single table (--table=<name>? -t=<name>)
- Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
- User definied dependencies. Allow the DB developer to specify once for
thier DB what the dependencies are, then use that files as a guide to the
restore process. (--deps=<file> -D=<file>)pg_restore will use the same custom IO routines to allow IO to
tar/directory/custom files. In the first pass, I will do custom file IO.If a user selects to restore the entire metadata, then it will be dumped
according to the defaul policy (OID order). If they select to specify the
items from an input file, then the file ordering is used.-------
Typical backup procedure:
pg_backup mydb mydb.bkp
or *maybe*
pg_backup mydb > mydb.bkp
BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.Typical restore procedure:
pg_restore mydb mydb.bkp | psql
A user will be able to extract only the schema (-s), only the data (-d), a
specific table (-t=name), or even edit the object order and selection via:pg_restore --dump-toc mydb.bkp > mytoc.txt
vi mytoc.txt {ie. reorder TOC elements as per known dependency problems}
pg_restore --item-list=mytoc.txt mydb.bkp | psqlFWIW, I envisage the ---dump-toc output to look like:
ID; FUNCTION FRED(INT4)
ID; TYPE MY_TYPE
ID; TABLE MY_TABLE
ID; DATA MY_TABLE
ID; INDEX MY_TABLE_IX1
...etc.so editing and reordering the dump plan should not be too onerous.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
At 20:49 10/10/00 -0400, Bruce Momjian wrote:
Can I ask on a status?
Done.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/