ERROR: Out of memory - when connecting to database
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.
It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.
Sometimes the error message is preceded by memory stats which are below
in the email.
Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GB
max_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0
max_fsm_pages = 10000000
max_fsm_relations = 100000
There are cca 1200 concurrent database connections (active backends). I
know it's too much, we're trying to reduce the number but it's not that
easy because of large number of databases and heavy use of listen/notify
so connection pooler doesn't help...
What can cause this error? What parameter should be raised?
Thanks,
Kuba
Messages preceding ERROR: out ouf memory message
TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks);
568160 used
TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9
chunks); 80554584 used
Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880 used
ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used
ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296
chunks); 12695727008 used
Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks);
21720 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used
smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks);
18928 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks);
638016 used
pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks);
1368 used
index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used
index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
...
Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks);
133888 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks);
22992 used
...
Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968 used
MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks);
4448 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
Replaying to my own mail. Maybe we've found the root cause:
In one database there was a table with 200k records where each record
contained 15kB bytea field. Auto-ANALYZE was running on that table
continuously (with statistics target 500). When we avoid the
auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table;
the problem with ERROR: out of memory went away.
Could it be that the failed connections were issued by autovacuum?
Thanks,
Kuba
Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a):
Show quoted text
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.Sometimes the error message is preceded by memory stats which are below
in the email.Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GBmax_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0max_fsm_pages = 10000000
max_fsm_relations = 100000There are cca 1200 concurrent database connections (active backends). I
know it's too much, we're trying to reduce the number but it's not that
easy because of large number of databases and heavy use of listen/notify
so connection pooler doesn't help...What can cause this error? What parameter should be raised?
Thanks,
Kuba
Messages preceding ERROR: out ouf memory message
TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks);
568160 used
TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9
chunks); 80554584 used
Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880
used
ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used
ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296
chunks); 12695727008 used
Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks);
21720 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used
smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks);
18928 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks);
638016 used
pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks);
1368 used
index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used
index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
...
Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks);
133888 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks);
22992 used
...
Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968
used
MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks);
4448 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
On Mon, Nov 08, 2010 at 07:19:43PM +0100, Jakub Ouhrabka wrote:
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.Sometimes the error message is preceded by memory stats which are
below in the email.Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GBmax_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0max_fsm_pages = 10000000
max_fsm_relations = 100000
is it 32bit or 64bit machine?
what's the work_mem?
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
is it 32bit or 64bit machine?
64bit
what's the work_mem?
64MB
Kuba
Dne 8.11.2010 19:52, hubert depesz lubaczewski napsal(a):
Show quoted text
On Mon, Nov 08, 2010 at 07:19:43PM +0100, Jakub Ouhrabka wrote:
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.Sometimes the error message is preceded by memory stats which are
below in the email.Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GBmax_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0max_fsm_pages = 10000000
max_fsm_relations = 100000is it 32bit or 64bit machine?
what's the work_mem?
Best regards,
depesz
Replaying to my own mail. Maybe we've found the root cause:
In one database there was a table with 200k records where each record
contained 15kB bytea field. Auto-ANALYZE was running on that table
continuously (with statistics target 500). When we avoid the
auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table;
the problem with ERROR: out of memory went away.
Could it be that the failed connections were issued by autovacuum?
Thanks,
Kuba
Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a):
Show quoted text
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.Sometimes the error message is preceded by memory stats which are below
in the email.Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GBmax_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0max_fsm_pages = 10000000
max_fsm_relations = 100000There are cca 1200 concurrent database connections (active backends). I
know it's too much, we're trying to reduce the number but it's not that
easy because of large number of databases and heavy use of listen/notify
so connection pooler doesn't help...What can cause this error? What parameter should be raised?
Thanks,
Kuba
Messages preceding ERROR: out ouf memory message
TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks);
568160 used
TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9
chunks); 80554584 used
Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880
used
ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used
ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296
chunks); 12695727008 used
Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks);
21720 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used
smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks);
18928 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks);
638016 used
pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks);
1368 used
index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used
index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
...
Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks);
133888 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks);
22992 used
...
Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968
used
MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks);
4448 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
On Mon, Nov 08, 2010 at 08:04:32PM +0100, Jakub Ouhrabka wrote:
is it 32bit or 64bit machine?
64bit
what's the work_mem?
64MB
that's *way* too much with 24GB of ram and > 1k connections. please
lower it to 32MB or even less.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
what's the work_mem?
64MB
that's *way* too much with 24GB of ram and> 1k connections. please
lower it to 32MB or even less.
Thanks for your reply. You are generally right. But in our case most of
the backends are only waiting for notify so not consuming any work_mem.
The server is not swapping. On the other hand some queries profit from
it. So we think it's fine as is.
Regards,
Kuba
Date: Mon, 8 Nov 2010 20:05:23 +0100
From: kuba@comgate.cz
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ERROR: Out of memory - when connecting to databaseReplaying to my own mail. Maybe we've found the root cause:
In one database there was a table with 200k records where each record
contained 15kB bytea field. Auto-ANALYZE was running on that table
continuously (with statistics target 500). When we avoid the
auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table;
the problem with ERROR: out of memory went away.
Run pgtune against you configuration and it will tell you what is recommended.
Do you really have 2048 connections to the box?
If yes, maybe you need to run pgbouncer with 2048 connections into pgbouncer concentrator and
100 connections to postgres? Will reduce your resource used significantly.
Chris
2010/11/8 Jakub Ouhrabka <kuba@comgate.cz>:
Replaying to my own mail. Maybe we've found the root cause:
In one database there was a table with 200k records where each record
contained 15kB bytea field. Auto-ANALYZE was running on that table
continuously (with statistics target 500). When we avoid the auto-ANALYZE
via UPDATE table set bytea_column = null; CLUSTER table; the problem with
ERROR: out of memory went away.Could it be that the failed connections were issued by autovacuum?
I think so not. Probably it use a different plan with different memory
requests. This is relative typical situation when statistics are out
together with HASH JOIN or HASH AGG. These two operations can get
unlimited memory. Send a plans of your queries for both cases.
Regards
Pavel Stehule
Show quoted text
Thanks,
Kuba
Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a):
Hi,
we have several instances of following error in server log:
2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory
2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384.It's always the first log message from the backend. We're trying to
trace it down. Whether it's always connection attempt to the same
database or not - I don't know at the moment.Sometimes the error message is preceded by memory stats which are below
in the email.Other relevant data:
Linux, PostgreSQL 8.2.10
RAM 28GBmax_connections = 2048
shared_buffers = 2048MB
temp_buffers = 32MB
max_prepared_transactions = 0max_fsm_pages = 10000000
max_fsm_relations = 100000There are cca 1200 concurrent database connections (active backends). I
know it's too much, we're trying to reduce the number but it's not that
easy because of large number of databases and heavy use of listen/notify
so connection pooler doesn't help...What can cause this error? What parameter should be raised?
Thanks,
Kuba
Messages preceding ERROR: out ouf memory message
TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks);
568160 used
TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9
chunks); 80554584 used
Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880
used
ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used
ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296
chunks); 12695727008 used
Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks);
21720 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used
smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks);
18928 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks);
638016 used
pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks);
1368 used
index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used
index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
...
Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks);
133888 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks);
115408 used
Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks);
22992 used
...
Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968
used
MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks);
4448 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Jakub Ouhrabka <kuba@comgate.cz> writes:
Could it be that the failed connections were issued by autovacuum?
They clearly were: notice the reference to "Autovacuum context" in the
memory map. I think you are right to suspect that auto-analyze was
getting blown out by the wide bytea columns. Did you have any
expression indexes involving those columns?
regards, tom lane
They clearly were: notice the reference to "Autovacuum context" in the
memory map. I think you are right to suspect that auto-analyze was
getting blown out by the wide bytea columns. Did you have any
expression indexes involving those columns?
Yes, there are two unique btree indexes:
(col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where
col4 is not null
(col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where
col4 is null
What is the cause of the out of memory then?
Thank you,
Kuba
Jakub Ouhrabka <kuba@comgate.cz> writes:
They clearly were: notice the reference to "Autovacuum context" in the
memory map. I think you are right to suspect that auto-analyze was
getting blown out by the wide bytea columns. Did you have any
expression indexes involving those columns?
Yes, there are two unique btree indexes:
(col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where
col4 is not null
(col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where
col4 is null
What is the cause of the out of memory then?
Hmm. I suspect that evaluation of that md5() call is resulting in
memory leakage, but not sure why or where exactly.
regards, tom lane