WAL file size vs. data file size

Started by Benover 14 years ago3 messagesgeneral

bench@silentmedia.com

over 14 years ago

Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <…> | psql -h new_server <…>" method. I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including indices, is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of space.

Is it expected that WAL files are less dense than data files? I understand that they'll include multiple versions of the data, while the data files only include the last one (if recently vacuumed), but it's not like a restore does much besides COPY commands. I also understand that when you specify an archive_timeout value, you might be bloating the WAL files, but again, in the case of a restore, it seems like each WAL file will be full of useful data. So how is it that I can have more WAL bytes than data bytes?

Now, the new server is dedicated to running just this database, so it's not like there was anything else that could be adding info the the WAL files. But it is a master for a hot standby cluster…. maybe that makes a difference?

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Ben (#1)

Re: WAL file size vs. data file size

Ben Chobot <bench@silentmedia.com> writes:

Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <�> | psql -h new_server <�>" method. I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including indices, is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of space.

Is it expected that WAL files are less dense than data files?

Yes, that's not particularly surprising ... but how come they weren't
getting recycled? Perhaps you had configured WAL archiving but it was
broken?

regards, tom lane

Ben

bench@silentmedia.com

over 14 years ago

In reply to: Tom Lane (#2)

Re: WAL file size vs. data file size

On Oct 27, 2011, at 8:44 AM, Tom Lane wrote:

Ben Chobot <bench@silentmedia.com> writes:

Today I tried to restore a 70GB database with the standard "pg_dump -h old_server <∑> | psql -h new_server <∑>" method. I had 100GB set aside for WAL files, which I figured surely would be enough, because all of the data, including indices, is only 70GB. So I was a bit surprised when the restore hung mis-way because my pg_xlogs directory ran out of space.

Is it expected that WAL files are less dense than data files?

Yes, that's not particularly surprising ... but how come they weren't
getting recycled? Perhaps you had configured WAL archiving but it was
broken?

It's because I'm archiving wal files into Amazon's S3, which is slooooooooooow. PG is recycling as fast as it can, but when a few MB of COPY rows seem to ballon up to a few hundred MB of WAL files, it has a lot to archive before it can recycle. It'll be fine for steady state but it looks like it's just going to be a waste for this initial load.

What's the expected density ratio? I was always under the impression it would be about 1:1 when doing things like COPY, and have never seen anything to the contrary.