backup and restore questions
Thank you all for replying. I appreciate the tips. Apologies to those who
were offended by the html formating.
Scott, quick question. The reason I assumed insert would be "safer" than
copy is because the docs say that in the case of copy it fails on a single
corrupted row whereas insert won't?
Sally
_________________________________________________________________
Stay informed on Election 2004 and the race to Super Tuesday.
http://special.msn.com/msn/election2004.armx
On Fri, 20 Feb 2004, Sally Sally wrote:
Thank you all for replying. I appreciate the tips. Apologies to those who
were offended by the html formating.
Scott, quick question. The reason I assumed insert would be "safer" than
copy is because the docs say that in the case of copy it fails on a single
corrupted row whereas insert won't?
right, but that's not exactly "safer" from a data integrity standpoint.
After all, if you insert 100M rows, and one fails, you now have 99.99999%
of your data in your database. If you don't check the import for errors,
you now have an incomplete load, and may not know it.
Wrapping the inserts inside of a begin;commit; pair is a neat trick to
both speed things up (postgresql is MUCH faster inserting 100M rows in one
big transaction than individually) AND it makes sure you don't get a
partial load.
Partial loads are a SERIOUS issue when you're importing 10k rows a day
into a database with millions of rows. You usually want all or nothing.
And don't worry about me being offended by html formatting, my email
client, pine, automatically strips that stuff out of your emails. Of
course, others, with their primitive email clients (haha, like there's
anything more primitive than pine :-) will appreciate the change.
What's funny is that when they reply to your email, then I see the html in
the stuff they quoted, but not in your original email.
On Fri, 20 Feb 2004, Sally Sally wrote:
Thank you all for replying. I appreciate the tips. Apologies to those who
were offended by the html formating.
Scott, quick question. The reason I assumed insert would be "safer" than
copy is because the docs say that in the case of copy it fails on a single
corrupted row whereas insert won't?
Right. What that means in plain terms though is that a single rows causes
an entire import of a table to fail. While individual inserts, which are
handled by individual transaction, can individually fail.
You can, however, edit the dump / extract a portion of it and wrap it in a
begin / commit pair. Note that postgresql will not commit any transaction
with an error, so you don't have to worry about it accidentally commiting
if the data errors out.
Also, collecting as many inserts as possible in a transaction will
generally make postgresql faster, up to a point. While there's no great
gain in inserting any more than a few thousand rows at a time, there's no
real harm in inserting many more (million). Unlike Oracle, which uses
rollback segments, postgresql uses the free disk space to just add new
tuples, so there's no real world limit to the size of your transactions,
except for the real world issue that a transaction taking that long to
insert rows may be an issue if you need to see the data from other clients
as it comes in.
In terms of importing, it may often be that you just want the good rows,
dump the bad, and move on. If this is the case, individual inserts are
the best choice. It is also fairly slow due to the fact that Postgesql
must build up and tear down a transaction for each row.
you may also have data that every row must go in, or you don't want any of
it. If this is the case, either the copy command, or a series of inserts
inside the same transaction are the best choice. They are also
the fastest, with copy slightly outperforming the inserts, at least in the
past. I haven't really tested one against the other lately because with
7.4 it's all so damned fast I only take about 15 minutes to backup or
restore our whole database.
I was trying to restore a zipped plain text dump created like this:
pg_dump --table=class --data-only school | gzip > school_class_dump.gz
I tried all the following combinations with pg_restore and it didn't work
cat school_class_dump.gz | gunzip | pg_restore -d school --format=c
error: [archiver] did not find magic string in file header
cat school_class_dump.gz | gunzip | pg_restore -d school
pg_restore: [archiver] input file does not appear to be a valid archive
I gunzipped and did
pg_restore -d school school_class_dump
error: [archiver] input file does not appear to be a valid archive
Does pg_restore really recognize plain text (version 7.4)?
Sally
From: "V i s h a l Kashyap @ [Sai Hertz And Control Systems]"
<sank89@sancharnet.in>
Reply-To: aspire420@hotpop.com
To: Sally Sally <dedeb17@hotmail.com>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] backup and restore questions
Date: Fri, 20 Feb 2004 00:16:56 +0530Dear Sally Sally ,
I had a few questions concerning the backup/restore process for pg.
1) Is it possible to dump data onto an existing database that contains
data (assumning the schema of both are the same). Has anyone done this? I
am thinking of this in order to expediate the data load processThis can work provided the database are on same system and have have same
schema not tried though.2) I read that when dumping and restoring data the insert option is
safer but slower than copy? Does anyone know from experience how much
slower (especially for a database containing millions of records).If you are real serious about your data best way AFAIK is insert because
with insert statments you can move around in case
you upgrade your database or add a new colum in new table but trying to
restore a old data of the same table.
On an
Celeron 900
PostgreSQL 7.3.4
RH 9.0
a 151Kb tared backup takes about 5 Minutes.
Though data restore depends 99 % on disk throughput 1% on CPU in case of
plain insert file
and 90 % on disk throughput and 10 % CPU in case of tared file.3) can pg_restore accept a file that is not archived like a zipped file
or plain text file (file.gz or file)Can use both zipped and Plain. New versions of pg_restore i.e 7.3 >
identify the file format automatically4) Is the general practise to have one whole dump of a database or
several separate dumps (by table etc...)?One dump for data and other dump for schema will always help.
--
Regards,
Vishal Kashyap
Director / Lead Developer,
Sai Hertz And Control Systems Pvt Ltd,
http://saihertz.rediffblogs.com
~*~*~*~*~*~*~*
You Guys start coding I will take care of what this
customers needs.
~*~*~*~*~*~*~*
I Know you believe my words so logon to Jabber.org
and add vishalkashyap@jabber.org to your roster.
OR
Seek Me at 264360076
~*~*~*~*~*~*~*
I am usually called as Vishal Kashyap
but my Girlfriend calls me as Vishal CASH UP.
This is because others identify me because of my
generosity but my Girlfriend identify me because
of my CASH.
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
_________________________________________________________________
Dream of owning a home? Find out how in the First-time Home Buying Guide.
http://special.msn.com/home/firsthome.armx
Import Notes
Resolved by subject fallback
On Mon, Feb 23, 2004 at 04:28:48PM +0000, Sally Sally wrote:
I was trying to restore a zipped plain text dump created like this:
pg_dump --table=class --data-only school | gzip > school_class_dump.gz
<snip>
Does pg_restore really recognize plain text (version 7.4)?
Nope, never has. For plain text dumps, just use psql.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
If the Catholic church can survive the printing press, science fiction
will certainly weather the advent of bookwarez.
http://craphound.com/ebooksneitherenorbooks.txt - Cory Doctorow