PG backup performance

Started by Isabella Ghiureaalmost 16 years ago5 messagesgeneral
Jump to latest
#1Isabella Ghiurea
isabella.ghiurea@nrc-cnrc.gc.ca

Hello PG list,
I 'm looking for some tip, advice toimprove PG backups performance,
presently running
pg_dumpall compressed option on raid array 0 getting aprox14GB writes
in 45 min, I'm backing up aprox 200GB database cluster daily .
How can I improve this performance with the present hardware and PG
version 8.3.6 , can I run parallel backups in PG ?
Thank you
Isabella

--
-----------------------------------------------------------
Isabella A. Ghiurea

Isabella.Ghiurea@nrc-cnrc.gc.ca
Canadian Astronomy Data Centre |http://www.nrc-cnrc.gc.ca/eng/services/hia/data-centre.html
National Research Council of Canada, Herzberg Institute of Astrophysics
5071 West Saanich Road, Victoria BC V9E 2E7, Canada
Phone: 250 363-3446 fax: 250 363-0045

#2Andy Colson
andy@squeakycode.net
In reply to: Isabella Ghiurea (#1)
Re: PG backup performance

On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:

Hello PG list,
I 'm looking for some tip, advice toimprove PG backups performance,
presently running
pg_dumpall compressed option on raid array 0 getting aprox14GB writes in
45 min, I'm backing up aprox 200GB database cluster daily .
How can I improve this performance with the present hardware and PG
version 8.3.6 , can I run parallel backups in PG ?
Thank you
Isabella

Short answer, yes, you can.
Long answer, we need more info.

We need to know what the slow part is.

Are you CPU bound or IO bound?
Are you backing up over a network?
How many tables? (well, big tables... how many really big tables).
How many cores/cpu's do you have?

I bet all 200GB hardly changes, are you sure you need to back it all up over and over again? Have you thought of replication?

-Andy

#3Isabella Ghiurea
isabella.ghiurea@nrc-cnrc.gc.ca
In reply to: Andy Colson (#2)
Re: PG backup performance

Hi Andy,
Thank you , please, see bellow my answers:
Andy Colson wrote:

On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:

Hello PG list,
I 'm looking for some tip, advice toimprove PG backups performance,
presently running
pg_dumpall compressed option on raid array 0 getting aprox14GB writes in
45 min, I'm backing up aprox 200GB database cluster daily .
How can I improve this performance with the present hardware and PG
version 8.3.6 , can I run parallel backups in PG ?
Thank you
Isabella

Short answer, yes, you can.
Long answer, we need more info.

We need to know what the slow part is.

Are you CPU bound

CPU

or IO bound?
Are you backing up over a network?

No , on locally disks

How many tables? (well, big tables... how many really big tables).

Around 20 big tables all of them in a one separate schema on separate
tables space from rest of other schemas. I already start backing up
individuals schema.
My big concern is IF I have to recover from backups will take me at
least twice as much time around 1-2 days I expect.

How many cores/cpu's do you have?

4 quad core CPU server. How can I make PG to use multi threading?

I bet all 200GB hardly changes, are you sure you need to back it all
up over and over again? Have you thought of replication?

Yes, but waiting for a more robust build in replication PG version
aka PG 9.1.
Isabella

--
-----------------------------------------------------------
Isabella A. Ghiurea

Isabella.Ghiurea@nrc-cnrc.gc.ca
Canadian Astronomy Data Centre |http://www.nrc-cnrc.gc.ca/eng/services/hia/data-centre.html
National Research Council of Canada, Herzberg Institute of Astrophysics
5071 West Saanich Road, Victoria BC V9E 2E7, Canada
Phone: 250 363-3446 fax: 250 363-0045

#4Andy Colson
andy@squeakycode.net
In reply to: Isabella Ghiurea (#3)
Re: PG backup performance

On 05/31/2010 02:45 PM, Isabella Ghiurea wrote:

Hi Andy,
Thank you , please, see bellow my answers:
Andy Colson wrote:

On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:

Hello PG list,
I 'm looking for some tip, advice toimprove PG backups performance,
presently running
pg_dumpall compressed option on raid array 0 getting aprox14GB

writes in

45 min, I'm backing up aprox 200GB database cluster daily .
How can I improve this performance with the present hardware and PG
version 8.3.6 , can I run parallel backups in PG ?
Thank you
Isabella

Short answer, yes, you can.
Long answer, we need more info.

We need to know what the slow part is.

Are you CPU bound

CPU

or IO bound?
Are you backing up over a network?

No , on locally disks

How many tables? (well, big tables... how many really big tables).

Around 20 big tables all of them in a one separate schema on separate
tables space from rest of other schemas. I already start backing up
individuals schema.
My big concern is IF I have to recover from backups will take me at
least twice as much time around 1-2 days I expect.

How many cores/cpu's do you have?

4 quad core CPU server. How can I make PG to use multi threading?

I bet all 200GB hardly changes, are you sure you need to back it all
up over and over again? Have you thought of replication?

Yes, but waiting for a more robust build in replication PG version aka
PG 9.1.
Isabella

cool.. my second astronomer :-)

You are cpu bound because pg_dump will use one cpu, and if you have pg_dump compress, it uses the same cpu. But if you use a pipe (pg_dump | gzip) then each gets a cpu. And instead of using pg_dumpall, which dumps one db at a time, use pg_dump and run several in parallel.

try something like:

pg_dump big1 | gzip > big1.sql & pg_dump big2 | gzip > big2.sql

This will dump two at a time using 4 cpu's.

Have a backup file for each big table will also make restore faster, because you can restore them all at the same time, where as pg_dumpall will restore one at a time.

The compression is probably going to take the most time... if your data is not very compressible then you'll get a big speed boost by dropping the compress.

-Andy

#5Andy Colson
andy@squeakycode.net
In reply to: Isabella Ghiurea (#3)
Re: PG backup performance

On 05/31/2010 02:45 PM, Isabella Ghiurea wrote:

I 'm looking for some tip, advice to improve PG backups performance,

Yep, I thought I recalled a conversation like this before, this might have some interesting info:

http://archives.postgresql.org/pgsql-performance/2010-03/msg00132.php

-Andy