pg_dump in a production environment

Started by Thomas F.O'Connellalmost 21 years ago13 messagesgeneral
Jump to latest
#1Thomas F.O'Connell
tfo@sitening.com

I have a web application backed by a PostgreSQL 7.4.6 database. It's
an application with a fairly standard login process verified against
the database.

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently
logging in and otherwise using the application, the application
becomes almost unusable (to the point where logins take on the order
of minutes).

According to the documentation, pg_dump shouldn't block other
operations on the database other than operations that operate with
exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I
tried again after killing that, thinking that perhaps any substantial
vacuum activity might affect pg_dump. I tried again to no avail.

Excepting the rest of the application, the login process should be
completely read-only and shouldn't require any exclusive locks.

Connections don't really pile up excessively, and load on the machine
does not get in the red zone. Is there anything else I should be
noticing?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

#2Matthew T. O'Connor
matthew@zeut.net
In reply to: Thomas F.O'Connell (#1)
Re: pg_dump in a production environment

Thomas F. O'Connell wrote:

I have a web application backed by a PostgreSQL 7.4.6 database. It's
an application with a fairly standard login process verified against
the database.

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently logging
in and otherwise using the application, the application becomes almost
unusable (to the point where logins take on the order of minutes).

Could this be an I/O saturation issue like the one the vacuum delay
settings are supposed to help with? Perhaps we could either extend the
vacuum delay settings to effect pg_dump, or make new option to pg_dump
that would have it slow down the dump.

BTW, have you tried running pg_dump from a separate machine? Or even
just making sure that the dump file is being written to a different disk
drive than PostgreSQL is running on. All that disk write activity is
bound to slow the system down.

Matthew

#3Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Thomas F.O'Connell (#1)
Re: pg_dump in a production environment

On Mon, 2005-05-23 at 14:54, Thomas F. O'Connell wrote:

I have a web application backed by a PostgreSQL 7.4.6 database. It's
an application with a fairly standard login process verified against
the database.

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently logging
in and otherwise using the application, the application becomes almost
unusable (to the point where logins take on the order of minutes).

According to the documentation, pg_dump shouldn't block other
operations on the database other than operations that operate with
exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I
tried again after killing that, thinking that perhaps any substantial
vacuum activity might affect pg_dump. I tried again to no avail.

Excepting the rest of the application, the login process should be
completely read-only and shouldn't require any exclusive locks.

Connections don't really pile up excessively, and load on the machine
does not get in the red zone. Is there anything else I should be
noticing?

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup. HAve you done a vacuum full
lately? It could be that you've got a lot of table bloat that's making
the seq scans take so long.

You could be I/O saturated already, and the backup is just pushing you
over the edge of the performance knee.

I do a 'vacuum analyze verbose' and see if you need more fsm setup for
your regular vacuums to keep up.

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Thomas F.O'Connell (#1)
Re: pg_dump in a production environment

What's you pg_dump command? Some options may take a lot of memory.

If you list the processes while this is going on, do you see one
chewing all your memory? i.e what's really causing the problem...

Hope this helps,

On Mon, May 23, 2005 at 02:54:46PM -0500, Thomas F. O'Connell wrote:

I have a web application backed by a PostgreSQL 7.4.6 database. It's
an application with a fairly standard login process verified against
the database.

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently
logging in and otherwise using the application, the application
becomes almost unusable (to the point where logins take on the order
of minutes).

According to the documentation, pg_dump shouldn't block other
operations on the database other than operations that operate with
exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I
tried again after killing that, thinking that perhaps any substantial
vacuum activity might affect pg_dump. I tried again to no avail.

Excepting the rest of the application, the login process should be
completely read-only and shouldn't require any exclusive locks.

Connections don't really pile up excessively, and load on the machine
does not get in the red zone. Is there anything else I should be
noticing?

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#5Thomas F.O'Connell
tfo@sitening.com
In reply to: Martijn van Oosterhout (#4)
Re: pg_dump in a production environment

Okay, I collated the three replies I got below for ease in replying.

I vacuum full analyze and reindexdb approximately once a month, but I
use pg_autovacuum as a matter of ongoing maintenance, and it seems to
hit equilibrium pretty well and seems to prevent bloat. The last time
I checked a vacuum analyze verbose, I had plenty of FSM to spare. The
data grows, but it doesn't seem to grow so quickly that I'd already
be out of FSM space.

I actually run pg_dump from a remote machine, so I/O contention on
the partition with $PGDATA shouldn't be an issue.

And here is the actual command:

pg_dump -h <host> -F c <database> > <dumpfile>

Pretty basic, although it is compressing.

As far as I can tell, the postmaster handling the dump request takes
up quite a bit of CPU, but not itself to the point where the database
should be unusable under ordinary circumstances. E.g., when a query/
backend eats up that much CPU, it doesn't prevent further access.

I'm suspicious more of something involving locks than of CPU.

Oh, and one other small(ish) detail: the dumping client is using a
7.4.8 installation, whereas the server itself is 7.4.6.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

Show quoted text

From: Scott Marlowe <smarlowe@g2switchworks.com>
Date: May 23, 2005 3:18:33 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup. HAve you done a vacuum full
lately? It could be that you've got a lot of table bloat that's
making
the seq scans take so long.

You could be I/O saturated already, and the backup is just pushing you
over the edge of the performance knee.

I do a 'vacuum analyze verbose' and see if you need more fsm setup
for
your regular vacuums to keep up.

From: "Matthew T. O'Connor" <matthew@zeut.net>
Date: May 23, 2005 3:18:18 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Could this be an I/O saturation issue like the one the vacuum delay
settings are supposed to help with? Perhaps we could either extend
the vacuum delay settings to effect pg_dump, or make new option to
pg_dump that would have it slow down the dump.

BTW, have you tried running pg_dump from a separate machine? Or
even just making sure that the dump file is being written to a
different disk drive than PostgreSQL is running on. All that disk
write activity is bound to slow the system down.

Matthew

From: Martijn van Oosterhout <kleptog@svana.org>
Date: May 23, 2005 3:25:23 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment
Reply-To: Martijn van Oosterhout <kleptog@svana.org>

What's you pg_dump command? Some options may take a lot of memory.

If you list the processes while this is going on, do you see one
chewing all your memory? i.e what's really causing the problem...

Hope this helps,

#6Thomas F.O'Connell
tfo@sitening.com
In reply to: Scott Marlowe (#3)
Re: pg_dump in a production environment

A note about database design, though: there are thousands of tables
in this database, most of them inherited. I haven't looked at the
internals of pg_dump, but generally, how do the sequential scans
work? Why would these prevent the tables from being accessed by
queries that don't require exclusive locks?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On May 23, 2005, at 3:18 PM, Scott Marlowe wrote:

Show quoted text

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup. HAve you done a vacuum full
lately? It could be that you've got a lot of table bloat that's
making
the seq scans take so long.

#7Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Thomas F.O'Connell (#6)
Re: pg_dump in a production environment

The real problem is that with 7.4's buffering algorithm, the sequential
scans blow the other data out of the internal buffers of postgresql.
And, since a backup needs all the data in the tables, it's gonna seq
scan them anyway. the tables can still be accessed, just the access is
going to be slow because your other processes are fighting the backup
AND nothing in the buffer is likely to be useful to them, except the one
table currently being backed up.

Show quoted text

On Mon, 2005-05-23 at 15:58, Thomas F. O'Connell wrote:

A note about database design, though: there are thousands of tables
in this database, most of them inherited. I haven't looked at the
internals of pg_dump, but generally, how do the sequential scans
work? Why would these prevent the tables from being accessed by
queries that don't require exclusive locks?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On May 23, 2005, at 3:18 PM, Scott Marlowe wrote:

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup. HAve you done a vacuum full
lately? It could be that you've got a lot of table bloat that's
making
the seq scans take so long.

#8Chris Kratz
chris.kratz@vistashare.com
In reply to: Thomas F.O'Connell (#5)
Re: pg_dump in a production environment

Hello Thomas,

We've had and have the exact same issue and have been unable to find a
satisfactory solution to the problem. Currently we "just live with it". We
do periodic backups with pg_dump on an hourly basis. During the dump, other
accesses to the db are incredibly slow making our web app feel somewhat
sluggish for 5 to 10 minutes while the db is dumped.

After a lot of research, it appears to be an i/o and memory contention issue.
Basically, the dump procedure has to pull in all data and tables into memory
to dump them which means any other requests have to pull the data they need
back off of disk (because they got paged out to make room for the dump data)
making them very slow. This is compounded by the fact that pg_dump usually
saturates your I/O throughput. Since postgres doesn't manage the file system
buffers (the os does), there appears to be no easy way to tell it to only use
x amount of memory for the dump leaving all the other memory available for
the running database. I have a hunch that the same thing happens with the
shared buffers, though I haven't proven that. This wasn't a problem for us
while the db fit into ram, but we've grown far past that point now.

The only solution we have ever found is simply to use something like a slony
slave and do dumps from the slave. The slave takes the performance hit and
your primary db keeps on running at full speed. Once the dump is done, then
the slave can "catch up" if it needs to. Unfortunately, I believe there are
issues currently with restoring off of a dump from a slave.

-Chris

On Monday 23 May 2005 04:56 pm, Thomas F. O'Connell wrote:

Okay, I collated the three replies I got below for ease in replying.

I vacuum full analyze and reindexdb approximately once a month, but I
use pg_autovacuum as a matter of ongoing maintenance, and it seems to
hit equilibrium pretty well and seems to prevent bloat. The last time
I checked a vacuum analyze verbose, I had plenty of FSM to spare. The
data grows, but it doesn't seem to grow so quickly that I'd already
be out of FSM space.

I actually run pg_dump from a remote machine, so I/O contention on
the partition with $PGDATA shouldn't be an issue.

And here is the actual command:

pg_dump -h <host> -F c <database> > <dumpfile>

Pretty basic, although it is compressing.

As far as I can tell, the postmaster handling the dump request takes
up quite a bit of CPU, but not itself to the point where the database
should be unusable under ordinary circumstances. E.g., when a query/
backend eats up that much CPU, it doesn't prevent further access.

I'm suspicious more of something involving locks than of CPU.

Oh, and one other small(ish) detail: the dumping client is using a
7.4.8 installation, whereas the server itself is 7.4.6.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

From: Scott Marlowe <smarlowe@g2switchworks.com>
Date: May 23, 2005 3:18:33 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup. HAve you done a vacuum full
lately? It could be that you've got a lot of table bloat that's
making
the seq scans take so long.

You could be I/O saturated already, and the backup is just pushing you
over the edge of the performance knee.

I do a 'vacuum analyze verbose' and see if you need more fsm setup
for
your regular vacuums to keep up.

From: "Matthew T. O'Connor" <matthew@zeut.net>
Date: May 23, 2005 3:18:18 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Could this be an I/O saturation issue like the one the vacuum delay
settings are supposed to help with? Perhaps we could either extend
the vacuum delay settings to effect pg_dump, or make new option to
pg_dump that would have it slow down the dump.

BTW, have you tried running pg_dump from a separate machine? Or
even just making sure that the dump file is being written to a
different disk drive than PostgreSQL is running on. All that disk
write activity is bound to slow the system down.

Matthew

From: Martijn van Oosterhout <kleptog@svana.org>
Date: May 23, 2005 3:25:23 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment
Reply-To: Martijn van Oosterhout <kleptog@svana.org>

What's you pg_dump command? Some options may take a lot of memory.

If you list the processes while this is going on, do you see one
chewing all your memory? i.e what's really causing the problem...

Hope this helps,

--
Chris Kratz

#9Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Chris Kratz (#8)
Re: pg_dump in a production environment

On Mon, 2005-05-23 at 16:54, Chris Kratz wrote:

Hello Thomas,

We've had and have the exact same issue and have been unable to find a
satisfactory solution to the problem. Currently we "just live with it". We
do periodic backups with pg_dump on an hourly basis. During the dump, other
accesses to the db are incredibly slow making our web app feel somewhat
sluggish for 5 to 10 minutes while the db is dumped.

After a lot of research, it appears to be an i/o and memory contention issue.
Basically, the dump procedure has to pull in all data and tables into memory
to dump them which means any other requests have to pull the data they need
back off of disk (because they got paged out to make room for the dump data)
making them very slow. This is compounded by the fact that pg_dump usually
saturates your I/O throughput. Since postgres doesn't manage the file system
buffers (the os does), there appears to be no easy way to tell it to only use
x amount of memory for the dump leaving all the other memory available for
the running database. I have a hunch that the same thing happens with the
shared buffers, though I haven't proven that. This wasn't a problem for us
while the db fit into ram, but we've grown far past that point now.

Are you folks running 8.0 with its improved caching algorithms? Just
wondering if that helps or not.

The only solution we have ever found is simply to use something like a slony
slave and do dumps from the slave. The slave takes the performance hit and
your primary db keeps on running at full speed. Once the dump is done, then
the slave can "catch up" if it needs to. Unfortunately, I believe there are
issues currently with restoring off of a dump from a slave.

Actually, there's a special dump program somewhere in the slony source
tree, one of the perl scripts. That should fix the issues with the
backups. I ran into it a while back and have to start using the same
file.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas F.O'Connell (#1)
Re: pg_dump in a production environment

"Thomas F. O'Connell" <tfo@sitening.com> writes:

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently
logging in and otherwise using the application, the application
becomes almost unusable (to the point where logins take on the order
of minutes).

The pg_dump sources contain some comments about throttling the rate
at which data is pulled from the server, with a statement that this
idea was discussed during July 2000 and eventually dropped. Perhaps
you can think of a better implementation.

regards, tom lane

#11Thomas F.O'Connell
tfo@sitening.com
In reply to: Tom Lane (#10)
Re: pg_dump in a production environment

Actually, I would find this to be an interesting project, but we're
on the verge of moving to 8.0 via Slony and will have a replicated
cluster, reducing the need for live dumps on the primary read/write
database.

It's too bad round tuits are so expensive!

I was trying to think of a way today in which pg_dump might be able
to use statistics in almost the opposite way of pg_autovacuum, such
that it steered clear of objects in heavy use, but I'm not familiar
enough with the source to know how this might work.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On May 23, 2005, at 11:12 PM, Tom Lane wrote:

Show quoted text

"Thomas F. O'Connell" <tfo@sitening.com> writes:

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently
logging in and otherwise using the application, the application
becomes almost unusable (to the point where logins take on the order
of minutes).

The pg_dump sources contain some comments about throttling the rate
at which data is pulled from the server, with a statement that this
idea was discussed during July 2000 and eventually dropped. Perhaps
you can think of a better implementation.

regards, tom lane

#12Russell Smith
mr-russ@pws.com.au
In reply to: Tom Lane (#10)
Re: (Ideas) pg_dump in a production environment

On Tue, 24 May 2005 02:12 pm, Tom Lane wrote:

"Thomas F. O'Connell" <tfo@sitening.com> writes:

I'd like to use pg_dump to grab a live backup and, based on the
documentation, this would seem to be a realistic possibility. When I
try, though, during business hours, when people are frequently
logging in and otherwise using the application, the application
becomes almost unusable (to the point where logins take on the order
of minutes).

The pg_dump sources contain some comments about throttling the rate
at which data is pulled from the server, with a statement that this
idea was discussed during July 2000 and eventually dropped. Perhaps
you can think of a better implementation.

A brief look at the code suggests a couple of possibilities for fixing problems.
There seem to be a least two different issues here from the user point of view.

Issue 1: Large Tables cause server slowdown
---
There are two dump cases in my short reading of the pg_dump code. Case 1 is
the copy dump, which is done as one command. The server does most of the work.
Case 2 is the INSERT type dump, where the pg_dump client does most of the work
creating the INSERT statement. Case 2 is done with a cursor, and it would be easy to
insert a fixed delay sleep at the end of a certain amount of record dumps. I'm sure we
could work out the average size of a tuple in this case (2), and even pause after a certain
amount of data has been transferred.

I am unsure about how to attack Case 1, as mentioned it is handled mostly in the backend code
which we don't really control. If it could be declared as a CURSOR you could you the same
principal as Case 2. The current throttling suggestions are all based on time. I think that
a data/counter based solution would be less intense on the system. When counter is reached,
just do a usleep for the throttle time.

Issue 2: Full Backups of large amount of data saturate disk I/O (Many tables make it slow)
---
If the backup dump is large, and given all files will be sequentially scanned during the backed,
the server IO is going to be pushed to the limit. A pause between dumping tables seems a
simple possibility to reduce the ongoing IO load on the server to allow for a period where other
requests can be served. This would result in a bursty type performance improvement. In environments
with large numbers of tables of a reasonable size, this could give a benefit.

---
In releases prior to 8.0, any sort of wait on a certain amount of data would possibly not evict high use
data as the wait time would mean that the frequently used data would have been accessed again,
meaning you would evict the seqscan data you requested for the previous part of the dump.
In post 8.0, or 8.1 with clock sweep, it's possibly the same situation with regard to the delays, but
you could possibly process larger amounts of data before the sleep, as you would keep recycling the same
buffers. You would use the sleep to reduce disk IO more than the reduce cache eviction.

The problem with timing waits for any backups are the database is not able to be vacuumed. In some
limited circumstances (like mine), If you have a long running transaction that blocks vacuum to certain
small high update tables, you lose performance as the table bloats and can only fix it with a vacuum full.

Both of these suggestions may be totally bogus. So I suppose I'm asking for feedback on them to see if
they would be worthwhile implementing.

Regards

Russell Smith

#13Chris Kratz
chris.kratz@vistashare.com
In reply to: Scott Marlowe (#9)
Re: pg_dump in a production environment

On Monday 23 May 2005 06:09 pm, Scott Marlowe wrote:

On Mon, 2005-05-23 at 16:54, Chris Kratz wrote:
Are you folks running 8.0 with its improved caching algorithms? Just
wondering if that helps or not.

I should have noted that we are still using 7.4 on our production servers. We
are planning an upgrade to 8.x later this year, so it is very plausible that
the new caching algorithms would help immensely. We are planning on moving
to a backup off of a slony slave at that point as well, so the issue will
probably be moot for us though it would still be nice to be able to do a
backup off of a running production machine.

Actually, there's a special dump program somewhere in the slony source
tree, one of the perl scripts. That should fix the issues with the
backups. I ran into it a while back and have to start using the same
file.

Interesting, I didn't know that, thanks for the pointer. We will look into
it.

-Chris
--
Chris Kratz