Space for pg_dump

Started by SHARMILA JOTHIRAJAHabout 17 years ago16 messagesgeneral

sharmi_jo@yahoo.com

about 17 years ago

Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?
Thanks

Richard Huxton

dev@archonet.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#1)

Re: Space for pg_dump

SHARMILA JOTHIRAJAH wrote:

Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?

That will depend on how many indexes etc. make up that 600GB. Also how
compressible your data is if you are using -Fc. Certainly less than a
"live" database, but by how much it's difficult to say.

--
Richard Huxton
Archonet Ltd

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#1)

Re: Space for pg_dump

On Tue, Mar 31, 2009 at 7:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:

Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I need to dump this?

Note you can find out by doing:

pg_dump dbname | wc

SHARMILA JOTHIRAJAH

sharmi_jo@yahoo.com

about 17 years ago

In reply to: Scott Marlowe (#3)

Re: Space for pg_dump

--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:

From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 11:49 AM
On Tue, Mar 31, 2009 at 7:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:

Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I

need to dump this?

Note you can find out by doing:

pg_dump dbname | wc

Yes...I could find the space used after creating the dump.

But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now

Thanks
Sharmila

Import Notes

Resolved by subject fallback

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#4)

Re: Space for pg_dump

On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:

--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 11:49 AM
On Tue, Mar 31, 2009 at 7:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:

Hi,
How much space does a pg_dump usually take?
One of my databases is 600GB.... How much space do I

need to dump this?

Note you can find out by doing:

pg_dump dbname | wc

Yes...I could find the space used after creating the dump.

But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now

It's hard to say. Why can't you RUN the example command on each db
and see for yourself? It doesn't create a backup, per se, it just
creates one and hands it to wc to see how big it is.

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#4)

Re: Space for pg_dump

On Tue, Mar 31, 2009 at 9:57 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:

But I need to pre-allocate some space for storing these dumps (there are other databases too that needs to be dumped). So Im trying to find a space estimate ....
Do you have a rough estimate of pg_dump in general... like 1/4 th of the database size or something like that...I just need a rough estimate for now

Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size. You can't really tell without
running pg_dump. The advantage of doing pg_dump|wc -l is that the db
doesn't have to be stored somewhere.

SHARMILA JOTHIRAJAH

sharmi_jo@yahoo.com

about 17 years ago

In reply to: Scott Marlowe (#6)

Re: Space for pg_dump

--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:

From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Space for pg_dump
To: "SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com>
Cc: "General postgres mailing list" <pgsql-general@postgresql.org>
Date: Tuesday, March 31, 2009, 12:07 PM
On Tue, Mar 31, 2009 at 9:57 AM,
SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com>
wrote:

But I need to pre-allocate some space for storing

these dumps (there are other databases too that needs to be
dumped). So Im trying to find a space estimate ....

Do you have a rough estimate of pg_dump in general...

like 1/4 th of the database size or something like that...I
just need a rough estimate for now

Sadly, there is no exact maths for such things. If
your database has
tons of indexes and such, it might be 20 or 100 times
bigger on disk
than it will be during backup. If it's all
compressible text with few
indexes, it might be a 1:1 or so size. You can't
really tell without
running pg_dump. The advantage of doing pg_dump|wc -l
is that the db
doesn't have to be stored somewhere.

Thanks...I started pg_dump|wc -l and its running now
Another question is that wc -l gives you the no of lines...right...
What is the size of each line...or how do you get the size from that?

Import Notes

Resolved by subject fallback

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#7)

Re: Space for pg_dump

On Tue, Mar 31, 2009 at 10:31 AM, SHARMILA JOTHIRAJAH
<sharmi_jo@yahoo.com> wrote:

--- On Tue, 3/31/09, Scott Marlowe <scott.marlowe@gmail.com> wrote:
From: Scott Marlowe <scott.marlowe@gmail.com>
wrote:

But I need to pre-allocate some space for storing

these dumps (there are other databases too that needs to be
dumped). So Im trying to find a space estimate ....

Do you have a rough estimate of pg_dump in general...

like 1/4 th of the database size or something like that...I
just need a rough estimate for now

Sadly, there is no exact maths for such things. If
your database has
tons of indexes and such, it might be 20 or 100 times
bigger on disk
than it will be during backup. If it's all
compressible text with few
indexes, it might be a 1:1 or so size. You can't
really tell without
running pg_dump. The advantage of doing pg_dump|wc -l
is that the db
doesn't have to be stored somewhere.

Thanks...I started pg_dump|wc -l and its running now
Another question is that wc -l gives you the no of lines...right...
What is the size of each line...or how do you get the size from that?

Whoops, pretty sure my first reply was just wc. the -l is a habit
from using to count lines. Do it again without the -l... sorry.

hubert depesz lubaczewski

depesz@depesz.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#4)

Re: Space for pg_dump

On Tue, Mar 31, 2009 at 08:57:28AM -0700, SHARMILA JOTHIRAJAH wrote:

Note you can find out by doing:
pg_dump dbname | wc

Yes...I could find the space used after creating the dump.
But I need to pre-allocate some space for storing these dumps

I'm not sure if you realize that you don't need any space for the
command that Scott showed (pg_dump dbname | wc). it will not write
anything to disk.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

#10

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Scott Marlowe (#6)

Re: Space for pg_dump

On Tue, 31 Mar 2009, Scott Marlowe wrote:

Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size.

Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

That lets you better see index vs. table usage. Then, for the bigger
tables, I do something like this:

psql -c "COPY (select * from bigtable limit 100000) to stdout" | gzip > bigtable.gz
gzip -l bigtable.gz

That lets you get sample a decent sized chunk of the table to figure out
what compression ratio you're likely to get on the data in there. Given
all the table sizes and a compression ratio estimate, from there you can
make a fairly accurate guess of what the whole dump is going to take up,
presuming your data is fairly evenly distributed such that the first
records that come back are typical of the whole thing.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#11

Rainer Bauer

usenet@munnin.com

about 17 years ago

In reply to: SHARMILA JOTHIRAJAH (#4)

Re: Space for pg_dump

Greg Smith wrote:

On Tue, 31 Mar 2009, Scott Marlowe wrote:

Sadly, there is no exact maths for such things. If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup. If it's all compressible text with few
indexes, it might be a 1:1 or so size.

Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case "tblConnections":

ERROR: relation "public.tblconnections" does not exist.

Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the
error.

Rainer

#12

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Rainer Bauer (#11)

Re: Space for pg_dump

Rainer Bauer <usenet@munnin.com> writes:

Greg Smith wrote:

Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case "tblConnections":

ERROR: relation "public.tblconnections" does not exist.

Replacing all occurences of <relname> by <'"' || relname || '"'> fixes the
error.

That still fails if the table name contains double quotes. A proper
solution is to use the table OID --- I've corrected the example.

regards, tom lane

#13

Jan Otto

asche@me.com

about 17 years ago

In reply to: Tom Lane (#12)

Re: Space for pg_dump

Hi,

Rainer Bauer <usenet@munnin.com> writes:

Greg Smith wrote:

Since running an entire pgdump can take forever on a big database,
what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

Interesting. However, the query gives an error if the table name
contains
upper case characters, like in my case "tblConnections":

ERROR: relation "public.tblconnections" does not exist.

Replacing all occurences of <relname> by <'"' || relname || '"'>
fixes the
error.

That still fails if the table name contains double quotes. A proper
solution is to use the table OID --- I've corrected the example.

If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.

Simple example (take a look at the first row -> public.media):

SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND nspname !~ '^pg_toast'
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
relation | size
---------------------------+------------
public.media | 727 MB
public.identifier_idx | 342 MB
public.media_pk | 190 MB
public.mediateypes_pk | 16 kB
public.mediaformats_uk | 16 kB
public.contentsizes_pk | 16 kB
public.contenttype_pk | 16 kB
public.mediaformats_pk | 16 kB
public.contenttypes | 8192 bytes
public.media_media_id_seq | 8192 bytes
public.contentsizes | 8192 bytes
public.mediaformats | 8192 bytes
public.mediatypes | 8192 bytes
public.vmedia2 | 0 bytes
public.vmedia | 0 bytes
(15 rows)

Now a fixed query which gets the sizes of the related pg_toast_oid and
pg_toast_oid_index too:

SELECT nspname || '.' || relname AS "relation",
         pg_size_pretty(pg_relation_size(C.oid)
                 + COALESCE((SELECT pg_relation_size(C2.oid) FROM  
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
                 + COALESCE((SELECT pg_relation_size(C3.oid) FROM  
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'), 
0::bigint)
         )  AS "size"
   FROM pg_class C
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
     AND nspname !~ '^pg_toast'
   ORDER BY pg_relation_size(C.oid)
                 + COALESCE((SELECT pg_relation_size(C2.oid) FROM  
pg_class C2 WHERE C2.relname = 'pg_toast_' || C.oid ),0::bigint)
                 + COALESCE((SELECT pg_relation_size(C3.oid) FROM  
pg_class C3 WHERE C3.relname = 'pg_toast_' || C.oid || '_index'), 
0::bigint)
         DESC
   LIMIT 20;
          relation          |    size
---------------------------+------------
  public.media              | 164 GB
  public.identifier_idx     | 342 MB
  public.media_pk           | 190 MB
  public.contenttype_pk     | 16 kB
  public.contenttypes       | 16 kB
  public.contentsizes       | 16 kB
  public.contentsizes_pk    | 16 kB
  public.mediateypes_pk     | 16 kB
  public.mediaformats       | 16 kB
  public.mediatypes         | 16 kB
  public.mediaformats_pk    | 16 kB
  public.mediaformats_uk    | 16 kB
  public.media_media_id_seq | 8192 bytes
  public.vmedia             | 0 bytes
  public.vmedia2            | 0 bytes
(15 rows)

There is a difference of about 163 GB (which is from the toast of
public.media)
relation | size
---------------------------------+------------
pg_toast.pg_toast_6366088 | 162 GB
pg_toast.pg_toast_6366088_index | 1832 MB
public.media | 727 MB

If you have only small or no toast tables the query from the wiki will
be working for you.

regards, jan

#14

Alvaro Herrera

alvherre@2ndquadrant.com

about 17 years ago

In reply to: Jan Otto (#13)

Re: Space for pg_dump

Jan Otto wrote:

If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.

Now a fixed query which gets the sizes of the related pg_toast_oid and
pg_toast_oid_index too:

Note that there is also the pg_total_relation_size function which will
report the total table size, including toast table and indexes.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15

Jan Otto

asche@me.com

about 17 years ago

In reply to: Alvaro Herrera (#14)

Re: Space for pg_dump

On Apr 20, 2009, at 7:35 PM, Alvaro Herrera wrote:

Jan Otto wrote:

If you have big toast tables you get wrong results with the query
suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the
toasted
values not into account.

Now a fixed query which gets the sizes of the related pg_toast_oid
and
pg_toast_oid_index too:

Note that there is also the pg_total_relation_size function which will
report the total table size, including toast table and indexes.

Ahhh, it was a long day... sometimes i think too complicated ;-)

regards, jan

#16

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Jan Otto (#13)

Re: Space for pg_dump

On Mon, 20 Apr 2009, Jan Otto wrote:

If you have big toast tables you get wrong results with the query suggested
at http://wiki.postgresql.org/wiki/Disk_Usage because it takes the toasted
values not into account.

I can't recall why I wrote that to filter out things in the pg_toast
namespace in the first place. I just took the easy way out here--the
query has been updated to not exclude relations in that namespace anymore
and I link to the TOAST docs for more details.

I consider combining all the values together, as you did in your example
code and as pg_total_relation_size does, as a different type of report.
Accordingly, I just updated with examples of both types, as well as
something to work against pre-8.1 databases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD