Space for pg_dump
Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?
Thanks
SHARMILA JOTHIRAJAH wrote:
Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?
That will depend on how many indexes etc. make up that 600GB. Also how
compressible your data is if you are using -Fc. Certainly less than a
"live" database, but by how much it's difficult to say.
--
Richard Huxton
Archonet Ltd
On Tue, Mar 31, 2009 at 7:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?
Note you can find out by doing:
pg_dump dbname | wc
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 11:49 AM
On Tue, Mar 31, 2009 at 7:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do Ineed to dump this?
Note you can find out by doing:
pg_dump dbname | wc
Yes...I could find the space used after creating the dump.
But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now
Thanks
Sharmila
Import Notes
Resolved by subject fallback
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 11:49 AM
On Tue, Mar 31, 2009 at 7:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do Ineed to dump this?
Note you can find out by doing:
pg_dump dbname | wc
Yes...I could find the space used after creating the dump.
But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now
It's hard to say. Why can't you RUN the example command on each db
and see for yourself? It doesn't create a backup, per se, it just
creates one and hands it to wc to see how big it is.
On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now
Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size. You can't really tell without
running pg_dump. The advantage of doing pg_dump|wc -l is that the db
doesn't have to be stored somewhere.
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 12:07 PM
On Tue, Mar 31, 2009 at 9:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:But I need to pre-allocate some space for storing
these dumps (there are other databases too that needs to be
dumped). So Im trying to find a space estimate ....Do you have a rough estimate of pg_dump in general...
like 1/4 th of the database size or something like that...I
just need a rough estimate for nowSadly, there is no exact maths for such things. If
your database has
tons of indexes and such, it might be 20 or 100 times
bigger on disk
than it will be during backup. If it's all
compressible text with few
indexes, it might be a 1:1 or so size. You can't
really tell without
running pg_dump. The advantage of doing pg_dump|wc -l
is that the db
doesn't have to be stored somewhere.
Thanks...I started pg_dump|wc -l and its running now
Another question is that wc -l gives you the no of lines...right...
What is the size of each line...or how do you get the size from that?
Import Notes
Resolved by subject fallback
On Tue, Mar 31, 2009 at 10:31 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:
--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:From: Scott Marlowe <scott.marlowe@gmail.com>
wrote:But I need to pre-allocate some space for storing
these dumps (there are other databases too that needs to be
dumped). So Im trying to find a space estimate ....Do you have a rough estimate of pg_dump in general...
like 1/4 th of the database size or something like that...I
just need a rough estimate for nowSadly, there is no exact maths for such things. If
your database has
tons of indexes and such, it might be 20 or 100 times
bigger on disk
than it will be during backup. If it's all
compressible text with few
indexes, it might be a 1:1 or so size. You can't
really tell without
running pg_dump. The advantage of doing pg_dump|wc -l
is that the db
doesn't have to be stored somewhere.Thanks...I started pg_dump|wc -l and its running now
Another question is that wc -l gives you the no of lines...right...
What is the size of each line...or how do you get the size from that?
Whoops, pretty sure my first reply was just wc. the -l is a habit
from using to count lines. Do it again without the -l... sorry.
On Tue, Mar 31, 2009 at 08:57:28AM -0700, SHARMILA JOTHIRAJAH wrote:
Note you can find out by doing:
pg_dump dbname | wcYes...I could find the space used after creating the dump.
But I need to pre-allocate some space for storing these dumps
I'm not sure if you realize that you don't need any space for the
command that Scott showed (pg_dump dbname | wc). it will not write
anything to disk.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Tue, 31 Mar 2009, Scott Marlowe wrote:
Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size.
Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage
That lets you better see index vs. table usage. Then, for the bigger
tables, I do something like this:
psql -c "COPY (select * from bigtable limit 100000) to stdout" | gzip > bigtable.gz
gzip -l bigtable.gz
That lets you get sample a decent sized chunk of the table to figure out
what compression ratio you're likely to get on the data in there. Given
all the table sizes and a compression ratio estimate, from there you can
make a fairly accurate guess of what the whole dump is going to take up,
presuming your data is fairly evenly distributed such that the first
records that come back are typical of the whole thing.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote:
On Tue, 31 Mar 2009, Scott Marlowe wrote:
Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size.Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage
Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case "tblConnections":
ERROR: relation "public.tblconnections" does not exist.
Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the
error.
Rainer
Rainer Bauer <usenet@munnin.com> writes:
Greg Smith wrote:
Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage
Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case "tblConnections":
ERROR: relation "public.tblconnections" does not exist.
Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the
error.
That still fails if the table name contains double quotes. A proper
solution is to use the table OID --- I've corrected the example.
regards, tom lane
Hi,
Rainer Bauer <usenet@munnin.com> writes:
Greg Smith wrote:
Since running an entire pgdump can take forever on a big database,
what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_UsageInteresting. However, the query gives an error if the table name
contains
upper case characters, like in my case "tblConnections":ERROR: relation "public.tblconnections" does not exist.
Replacing all occurences of <relname> by <'"' || relname || '"'>
fixes the
error.That still fails if the table name contains double quotes. A proper
solution is to use the table OID --- I've corrected the example.
If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.
Simple example (take a look at the first row -> public.media):
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND nspname !~ '^pg_toast'
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
relation | size
---------------------------+------------
public.media | 727 MB
public.identifier_idx | 342 MB
public.media_pk | 190 MB
public.mediateypes_pk | 16 kB
public.mediaformats_uk | 16 kB
public.contentsizes_pk | 16 kB
public.contenttype_pk | 16 kB
public.mediaformats_pk | 16 kB
public.contenttypes | 8192 bytes
public.media_media_id_seq | 8192 bytes
public.contentsizes | 8192 bytes
public.mediaformats | 8192 bytes
public.mediatypes | 8192 bytes
public.vmedia2 | 0 bytes
public.vmedia | 0 bytes
(15 rows)
Now a fixed query which gets the sizes of the related pg_toast_oid and
pg_toast_oid_index too:
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)
+ COALESCE((SELECT pg_relation_size(C2.oid) FROM
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
+ COALESCE((SELECT pg_relation_size(C3.oid) FROM
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'),
0::bigint)
) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND nspname !~ '^pg_toast'
ORDER BY pg_relation_size(C.oid)
+ COALESCE((SELECT pg_relation_size(C2.oid) FROM
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
+ COALESCE((SELECT pg_relation_size(C3.oid) FROM
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'),
0::bigint)
DESC
LIMIT 20;
relation | size
---------------------------+------------
public.media | 164 GB
public.identifier_idx | 342 MB
public.media_pk | 190 MB
public.contenttype_pk | 16 kB
public.contenttypes | 16 kB
public.contentsizes | 16 kB
public.contentsizes_pk | 16 kB
public.mediateypes_pk | 16 kB
public.mediaformats | 16 kB
public.mediatypes | 16 kB
public.mediaformats_pk | 16 kB
public.mediaformats_uk | 16 kB
public.media_media_id_seq | 8192 bytes
public.vmedia | 0 bytes
public.vmedia2 | 0 bytes
(15 rows)
There is a difference of about 163 GB (which is from the toast of
public.media)
relation | size
---------------------------------+------------
pg_toast.pg_toast_6366088 | 162 GB
pg_toast.pg_toast_6366088_index | 1832 MB
public.media | 727 MB
If you have only small or no toast tables the query from the wiki will
be working for you.
regards, jan
Jan Otto wrote:
If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.
Now a fixed query which gets the sizes of the related pg_toast_oid and
pg_toast_oid_index too:
Note that there is also the pg_total_relation_size function which will
report the total table size, including toast table and indexes.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Apr 20, 2009, at 7:35 PM, Alvaro Herrera wrote:
Jan Otto wrote:
If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.Now a fixed query which gets the sizes of the related pg_toast_oid
and
pg_toast_oid_index too:Note that there is also the pg_total_relation_size function which will
report the total table size, including toast table and indexes.
Ahhh, it was a long day... sometimes i think too complicated ;-)
regards, jan
On Mon, 20 Apr 2009, Jan Otto wrote:
If you have big toast tables you get wrong results with the query suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the toasted
values not into account.
I can't recall why I wrote that to filter out things in the pg_toast
namespace in the first place. I just took the easy way out here--the
query has been updated to not exclude relations in that namespace anymore
and I link to the TOAST docs for more details.
I consider combining all the values together, as you did in your example
code and as pg_total_relation_size does, as a different type of report.
Accordingly, I just updated with examples of both types, as well as
something to work against pre-8.1 databases.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD