pg_restore out of memory
I am trying to restore a file that was done with pg_dump -Fc
pg_dump on a postgreql 8.1.4 machine
pg_restore on a postgresql 8.2.4 machine.
The restore machine has the following settings.
/etc/sysctl.conf (FreeBSD machine)
kern.ipc.shmall=262144
kern.ipc.shmmax=534773760 #510MB
kern.ipc.semmap=256
/boot/loader.conf
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256
kern.maxdsiz="1600MB" #1.6GB
kern.dfldsiz="1600MB" #1.6GB
kern.maxssiz="128M" # 128MB
shared_buffers = 450MB
temp_buffers = 8MB
work_mem = 8MB # min 64kB
maintenance_work_mem = 64M
max_fsm_pages = 5000000 #Had error with 100,000 and increased
to 5Million while trying the pg_restore
OS can see 3.5GB of RAM.
Swap is 20GB.
The error that I got was:
pg_restore: [archiver (db)] Error from TOC entry 2146; 0 988154 TABLE DATA
message_attachments pgsql
pg_restore: [archiver (db)] COPY failed: ERROR: out of memory
DETAIL: Failed on request of size 134217728 (128MB)
Syslog was:
Jun 14 10:17:56 bk20 postgres[7294]: [130-1] ERROR: out of memory
Jun 14 10:17:56 bk20 postgres[7294]: [130-2] DETAIL: Failed on request of size 134217728.
I see a simmilar thread back in December 2006.
http://tinyurl.com/3aa29g
However i don't see a final resolution.
Which parameter do I need to increase?
Francisco Reyes <lists@stringsutils.com> writes:
Syslog was:
Jun 14 10:17:56 bk20 postgres[7294]: [130-1] ERROR: out of memory
Jun 14 10:17:56 bk20 postgres[7294]: [130-2] DETAIL: Failed on request of size 134217728.
Can we see the context-sizes dump that should've come out right before
that in the log?
regards, tom lane
Tom Lane writes:
Francisco Reyes <lists@stringsutils.com> writes:
Syslog was:
Jun 14 10:17:56 bk20 postgres[7294]: [130-1] ERROR: out of memory
Jun 14 10:17:56 bk20 postgres[7294]: [130-2] DETAIL: Failed on request of size 134217728.Can we see the context-sizes dump that should've come out right before
that in the log?
Is this the one you need?
Is was right after the error
Jun 14 10:17:56 bk20 postgres[7294]: [130-3] CONTEXT: COPY message_attachments,
line 60490: "2720290 7225017 research/crew holds.sit sit 88885753 t 1
Jun 14 10:17:56 bk20 postgres[7294]: [130-4] U3R1ZmZJdCAoYykxOTk3LTIwMDIgQWxhZGRpbiBTeX..."
Jun 14 10:17:56 bk20 postgres[7294]: [130-5] STATEMENT: COPY
message_attachments (attachment_id, message_id, filename, extension, attachment_size, name_real,
Jun 14 10:17:56 bk20 postgres[7294]: [130-6] parser_version,
attachment_search_text, attachment_body, delete_status, delete_status_date)
FROM stdin;
"Francisco Reyes" <lists@stringsutils.com> writes:
kern.maxdsiz="1600MB" #1.6GB
kern.dfldsiz="1600MB" #1.6GB
kern.maxssiz="128M" # 128MB
It ought to be maxdsiz which seems large enough.
The error that I got was:
pg_restore: [archiver (db)] Error from TOC entry 2146; 0 988154 TABLE DATA
message_attachments pgsql
pg_restore: [archiver (db)] COPY failed: ERROR: out of memory
DETAIL: Failed on request of size 134217728 (128MB)
What does the output of "ulimit -a" show? Can you arrange to run ulimit -a in
the same environment as the server? Either by starting the server in shell
manually or by putting ulimit -a in the startup script which starts the server
if you have one?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark writes:
What does the output of "ulimit -a" show?
In FreeBSD, as far as I know, what controls the size of a program is the
/boot/loader.conf and /etc/login.conf
The default /etc/login.conf has unlimited size.
/boot/loader.conf is set to max program size of 1.6GB
Can you arrange to run ulimit -a in
the same environment as the server?
There is no "ulimit -a" in cshell which is what I use.
I guessed this may be a bash setting .. so tried that..
The output of ulimit -a is:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) 524288
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 11095
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 65536
cpu time (seconds, -t) unlimited
max user processes (-u) 5547
virtual memory (kbytes, -v) unlimited
Don't see any limit at 128MB, the size at which the program
crashed.
Francisco Reyes <lists@stringsutils.com> writes:
Tom Lane writes:
Can we see the context-sizes dump that should've come out right before
that in the log?
Is this the one you need?
No. [squints...] Hm, you're looking at a syslog log, aren't you.
The memory dump only comes out on stderr (I think because of paranoia
about running out of memory while trying to report we're out of memory).
Can you get the postmaster's stderr output?
regards, tom lane
"Francisco Reyes" <lists@stringsutils.com> writes:
There is no "ulimit -a" in cshell which is what I use.
I guessed this may be a bash setting .. so tried that..
The output of ulimit -a is:
The csh equivalent is just "limit".
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) 524288
Don't see any limit at 128MB, the size at which the program crashed.
You might try starting Postgres with
ulimit -d unliited
or
limit datasize unlimit
You're right that your limit is above 128M but the error just means it tried
to allocated 128M and failed, it may already have allocated 400M and been
trying to go over the 524M mark.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Tom Lane writes:
The memory dump only comes out on stderr (I think because of paranoia
about running out of memory while trying to report we're out of memory).
Can you get the postmaster's stderr output?
From stderr
pg_restore: restoring data for table "message_attachments"
pg_restore: [archiver (db)] Error from TOC entry 2146; 0 988154 TABLE DATA
message_attachments pgsql
pg_restore: [archiver (db)] COPY failed: ERROR: out of memory
DETAIL: Failed on request of size 134217728.
CONTEXT: COPY message_attachments, line 60490: "2720290 7225017
research/crew holds.sit sit 88885753 t
1U3R1ZmZJdCAoYykxOTk3LTIwMDIgQWxhZGRpbiBTeX..."
Gregory Stark writes:
You're right that your limit is above 128M but the error just means it tried
to allocated 128M and failed, it may already have allocated 400M and been
trying to go over the 524M mark.
My limit should be 1.6GB as per /boot/loader.conf
I ran a ps every 20 seconds to try and catch the program as it was going
over the size.
The ps right before the crash is:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
pgsql 9364 6.2 22.8 904472 803436 ?? Rs
8:34PM 139:37.46 postgres: pgsql backaway_copy [local] COPY (postgres)
fran 9363 8.2 8.6 338928 305332 p3 S+
8:34PM 244:49.41 pg_restore -U pgsql -v -d backaway_copy backaway-fc-2007-06-03.pg_d
pgsql 15373 2.3 10.6 613892 374000 ?? Rs 3:06AM
11:45.28 postgres: pgsql full_table_restores [local] VACUUM (postgres)
904472 = 883MB
So if it was trying to allocate 128MB more it crashed trying to allocate
somewhere around 1011 MB. Although since I do it every 20 seconds.. could
have gone beyond that.
The question is, what type of memory is that trying to allocate?
Shared memory?
Shared memory from is kern.ipc.shmmax=534773760, well below the 1011MB or
above that I am estimating it crashed from.
The FreeBSD default is to cap programs at 512MB so thet fact the program is
crashing at 1011MB means that the /boot/loader.conf setting of 1.6GB memory
cap is active.
If it is of any help.. information about the data.
It is hex64 encoded data. The original data getting encoded can be up to
100MB. Not sure how much bigger the hex encoding could be making it.
Gregory Stark writes:
I'm skeptical that you can trust ps's VSZ reporting at this level of detail.
On some platforms VSZ includes a proportionate share of its shared memory or
might not include memory allocated but not actually written to yet (due to
copy-on-write).
Understood. But at least it shows that the program was already above the
default of 512MB limit of the operating system.
"Francisco Reyes" <lists@stringsutils.com> writes:
Gregory Stark writes:
You're right that your limit is above 128M but the error just means it tried
to allocated 128M and failed, it may already have allocated 400M and been
trying to go over the 524M mark.My limit should be 1.6GB as per /boot/loader.conf
Well according to "limit" it was 524M. I don't know how that relates to the
kernel limits you see in /boot/loader.conf. It may be the same thing but being
lowered by something in the startup scripts or it may be unrelated.
The question is, what type of memory is that trying to allocate?
Shared memory?
If it couldn't allocate the shared memory it wanted it wouldn't start up.
The FreeBSD default is to cap programs at 512MB so thet fact the program is
crashing at 1011MB means that the /boot/loader.conf setting of 1.6GB memory cap
is active.
I'm skeptical that you can trust ps's VSZ reporting at this level of detail.
On some platforms VSZ includes a proportionate share of its shared memory or
might not include memory allocated but not actually written to yet (due to
copy-on-write).
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Francisco Reyes wrote:
Tom Lane writes:
The memory dump only comes out on stderr (I think because of paranoia
about running out of memory while trying to report we're out of memory).
Can you get the postmaster's stderr output?From stderr
pg_restore: restoring data for table "message_attachments"
pg_restore: [archiver (db)] Error from TOC entry 2146; 0 988154 TABLE DATA
message_attachments pgsql
This is pg_restore's stderr. What Tom wants to see is postmaster's. It
is probably redirected (hopefully to a file, but regretfully it is
common to see it go to /dev/null) on the init script that starts the
service.
If it's going to /dev/null, change it to somewhere more reasonable and
try again. Or you could start the postmaster by hand on a terminal
under your control.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Francisco Reyes <lists@stringsutils.com> writes:
If it is of any help.. information about the data.
It is hex64 encoded data. The original data getting encoded can be up to
100MB. Not sure how much bigger the hex encoding could be making it.
Hmm. I'm wondering about a memory leak in the input converter. What
datatype exactly are the wide column(s)? Also, do you have any ON
INSERT triggers on this table?
regards, tom lane
Tom Lane writes:
Hmm. I'm wondering about a memory leak in the input converter. What
datatype exactly are the wide column(s)?
Text.
Also, do you have any ON INSERT triggers on this table?
No.
Alvaro Herrera writes:
This is pg_restore's stderr. What Tom wants to see is postmaster's. It
is probably redirected (hopefully to a file, but regretfully it is
common to see it go to /dev/null) on the init script that starts the
service.
How would I run it manually?
When I do pg_ctl start, no output ever goes to the screen.
I also looked at the freebsd startup script. It also uses pg_ctl.
Do I just need to send "-l <filename>" to pg_ctl?
According to the man page the default is to send output to /dev/null.
It does seem like -l will redirect to file so going to try that.
man pg_ctl
...
In start mode, a new server is launched. The server is started in the
background, and standard input is attached to /dev/null. The standard
output and standard error are either appended to a log file (if the -l
option is used), or redirected to pg_ctl's standard output (not
stan-dard error).
...
Alvaro Herrera writes:
This is pg_restore's stderr. What Tom wants to see is postmaster's. It
is probably redirected (hopefully to a file, but regretfully it is
common to see it go to /dev/null) on the init script that starts the
pg_ctl -l <file> didn't work.
Trying now with changes to postgreql.conf:
redirect_stderr = on
log_directory = '/data/logs'
log_filename = 'postgresql-%Y-%m-%d.log
log_error_verbosity = verbose
And increased all error levels to debug1
That looks promissing already.. see errors going to the file.
Trying again. will be a few hours before it gets to the line where it runs
out of memory.
Tom Lane writes:
Can we see the context-sizes dump that should've come out right before
that in the log?
Hope this is what you are looking for.
Included a few lines before the error in case that is of any help.
These lines are from the postgresql log. Redirected stderr to a file.
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808
used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
MdSmgr: 8192 total in 1 blocks; 6616 free (0 chunks); 1576 used
LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ERROR: 53200: out of memory
DETAIL: Failed on request of size 134217728.
CONTEXT: COPY message_attachments, line 60490: "2720290 7225017
research/crew holds.sit sit 88885753 t 1
U3R1ZmZJdCAoYykxOTk3LTIwMDIgQWxhZGRpbiBTeX..."
LOCATION: AllocSetRealloc, aset.c:907
STATEMENT: COPY message_attachments (attachment_id, message_id, filename,
extension, attachment_size, name_real, parser_version,
attachment_search_text, attachment_body, delete_status, delete_status_date)
FROM stdin;
The table has no insert triggers, but does have a Foreign-key constraint.
"message_attachments_message_id_fkey" FOREIGN KEY (message_id) REFERENCES
messages(message_id)
It also has 3 indexes and 2 check constraints.
Francisco Reyes <lists@stringsutils.com> writes:
Tom Lane writes:
Can we see the context-sizes dump that should've come out right before
that in the log?
Hope this is what you are looking for.
Included a few lines before the error in case that is of any help.
Well, that's the last few lines of what I wanted, but not the part that
was interesting :-(. Please show all the lines that are like the "n
total in m blocks" format. Or at least look for the one(s) that contain
large numbers...
The table has no insert triggers, but does have a Foreign-key constraint.
"message_attachments_message_id_fkey" FOREIGN KEY (message_id) REFERENCES
messages(message_id)
Hmm, you may well be blowing out the deferred-foreign-key-check list.
But pg_dump normally orders its operations so that the data is loaded
before trying to set up FK constraints. Are you perhaps trying to do a
data-only restore?
regards, tom lane
Tom Lane writes:
Well, that's the last few lines of what I wanted, but not the part that
was interesting :-(. Please show all the lines that are like the "n
total in m blocks" format. Or at least look for the one(s) that contain
large numbers...
Since you know best what you are looking for I just put the log for you at
http://public.natserv.net/postgresql-2007-06-15.log
Hmm, you may well be blowing out the deferred-foreign-key-check list.
But pg_dump normally orders its operations so that the data is loaded
before trying to set up FK constraints. Are you perhaps trying to do a
data-only restore?
I don't believe so.
The lines I am using for both backup/restore are:
pg_dump -h <host> -Fc <database> >/data2/backups/`date"+%F"`.pg_dump
pg_restore -U pgsql -v -L Load-list.txt -d copydb 2007-06-03.pg_dump
Francisco Reyes <lists@stringsutils.com> writes:
Tom Lane writes:
Well, that's the last few lines of what I wanted, but not the part that
was interesting :-(. Please show all the lines that are like the "n
total in m blocks" format. Or at least look for the one(s) that contain
large numbers...
Since you know best what you are looking for I just put the log for you at
http://public.natserv.net/postgresql-2007-06-15.log
The only important space consumption seems to be here:
PortalHeapMemory: 335682660 total in 13 blocks; 68880 free (52 chunks); 335613780 used
So it's not a deferred-trigger problem (trigger events would be in their
own context). Also, the fact that it's trying to allocate exactly 128MB
lets out a lot of possibilities --- that has to be a single huge
request, not just the end result of a boatload of little ones. So it
probably isn't a memory leak per se.
One time that we've seen something like this before is when someone
had a COPY datafile that had had its newlines munged, such that COPY
thought it was all one long line and tried to read it all in at once.
What's the history of your datafile --- has it maybe gotten passed
through a Windows newline conversion?
regards, tom lane