Running out of memory at vacuum
Hi all,
I have a production database that sometimes runs out of memory at nightly vacuum.
The application runs typically with around 40 postgres connections but there are times when the connections increase because of some queries going on. The reason is that the operations are slow, the terminals time out and try to reconnect using new connections. Some time ago I started to have problems with too many connections being open so I lowered the limit to 300 connections. It was all good until recently when even with 180 connections I am running out of memory on vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are idle.
2013-05-14 06:53:51.449
CST,"postgres","abrazo",8741,"[local]",519233dc.2225,3,"VACUUM",2013-05-14 06:53:48 CST,174/67143,0,ERROR,53200,"out of memory","Failed on request of size 668141820.",,,,,"vacuum;",,,"psql"
OS:
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
Suse compiled postgre version :
version
----------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.0.3 on x86_64-suse-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit
Postgres configuration parameters:
max_connections = 300
shared_buffers = 2GB
maintenance_work_mem = 2GB
effective_cache_size = 4GB
work_mem = 2MB
wal_buffers = 1MB
checkpoint_segments = 16
autovacuum = on
wal_level = hot_standby
archive_mode = on
archive_command = 'test -f /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=30 -atz %p spsnidb1:/data01/wal_files/%f'
archive_timeout = 60
free
total used free shared buffers cached
Mem: 7880512 7825136 55376 0 72376 4537644
-/+ buffers/cache: 3215116 4665396
Swap: 2097148 0 2097148
/etc/sysctl.conf
kernel.shmmax=2684354560
kernel.shmall=2684354560
vm.swappiness=0
vm.overcommit_memory=2
postgres log:
TopMemoryContext: 179088 total in 18 blocks; 8608 free (13 chunks); 170480 used
TopTransactionContext: 24576 total in 2 blocks; 21888 free (43 chunks); 2688 used
Attopt cache: 57344 total in 3 blocks; 25824 free (5 chunks); 31520 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 8192 total in 1 blocks; 4952 free (1 chunks); 3240 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 57344 total in 3 blocks; 21984 free (10 chunks); 35360 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 2048 total in 1 blocks; 2016 free (3 chunks); 32 used
Vacuum: 24576 total in 2 blocks; 15392 free (0 chunks); 9184 used
Relcache by OID: 24576 total in 2 blocks; 6592 free (3 chunks); 17984 used
CacheMemoryContext: 2390704 total in 22 blocks; 641048 free (4 chunks); 1749656 used
pk_cbntransaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
rel_transactionretailerid_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
rel_transaction_promotion_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
reference_64_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
reference_49_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
reference_46_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
reference_28_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_transaction_date: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
ix_referenceid_transac: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_onlinetick_cbntrans: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_drawid_transaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_cbntransaction_terminalid_transactiondate: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
ix_cbntransaction_purchaseid: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_cardactivity: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_business_type: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_207856_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_bank: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_audit_1: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pg_toast_207843_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_itshipment: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pk_itpack: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pg_toast_25392890_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_drawpatterns: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pg_toast_50580_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
rel_draw_gameid_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
reference_29_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_draw: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_62768_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_39412_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
ix_a_gamewinner_2006_7: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2006_6: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2006_5: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2006_4: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2006_2: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2006_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_gamewinner_2006: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39404_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
ix_a_gamewinner_2005_7: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2005_6: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2005_5: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2005_4: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2005_2: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_a_gamewinner_2005_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_gamewinner_2005: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39428_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_cbntransaction_2008: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_cashdrawerbalance_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_cashdrawerbalance: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_62658_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_cashdrawer: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_account: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ix_cashdrawertransaction_cashdrawerid_transdate: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_cashdrawertransaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_62672_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
ix_cashdrawerbalanceitem_1: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_62666_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_22502278_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_26623904_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_26623898_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_dbversion: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pg_toast_39501_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_matchgameplay_2009: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39436_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_matchgameplay_2008: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39396_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_matchgameplay_2007: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39388_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_matchgameplay_2006: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_39380_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_matchgameplay_2005: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_accountforms: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_validzreporttypes: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_63590_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
ix_validtransactiontype_userdesc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_validtransactiontype: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_validtransactionstatus: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_address: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_62598_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_62590_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_activitycontext: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_25232699_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_25232693_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_26623892_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_26623886_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_cast_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_507675_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_cbntransaction_2013: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_21922437_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_dbcompare_tables: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_507667_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pk_a_cbntransaction_2012: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_499794_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_inherits_relid_seqno_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_inherits_parent_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_11474_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_11469_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_11464_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_2604_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_attrdef_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
pg_toast_11459_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_11454_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_largeobject_loid_pn_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_default_acl_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_2396_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_shdescription_o_c_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_shdepend_reference_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_shdepend_depender_index: 2048 total in 1 blocks; 440 free (0 chunks); 1608 used
pg_pltemplate_name_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_tablespace_spcname_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_2964_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_depend_reference_index: 2048 total in 1 blocks; 456 free (0 chunks); 1592 used
pg_depend_depender_index: 2048 total in 1 blocks; 456 free (0 chunks); 1592 used
pg_toast_2609_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_description_o_c_o_index: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
pg_toast_2620_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_trigger_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_trigger_tgconstraint_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_2618_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_rewrite_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_largeobject_metadata_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_amproc_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_amop_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_am_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_am_name_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_1262_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_2606_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_constraint_contypid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_constraint_conrelid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_constraint_conname_nsp_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_11484_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_11479_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_1255_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_toast_2619_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1448 free (4 chunks); 1624 used
pg_foreign_data_wrapper_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_class_relname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_foreign_server_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1448 free (4 chunks); 1624 used
pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_foreign_data_wrapper_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_user_mapping_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_namespace_nspname_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
pg_foreign_server_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_user_mapping_user_server_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_tablespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_database_datname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
MdSmgr: 24576 total in 2 blocks; 9504 free (0 chunks); 15072 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used
Any suggestions are greatly appreciated
Thanks,
ioana
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tuesday, May 14, 2013, Ioana Danes wrote:
Hi all,
I have a production database that sometimes runs out of memory at nightly
vacuum.The application runs typically with around 40 postgres connections but
there are times when the connections increase because of some queries going
on.
You should consider not allowing that to happen. If 40 connections is the
sweet spot for your hardware, then you can't change reality simply by
changing the connection pool setting.
The reason is that the operations are slow, the terminals time out and try
to reconnect using new connections.
So, "The beatings will continue until morale improves". You should
consider not allowing that to happen, either. If the original operation is
slow, why would trying it again on a new connection (while leaving the
original one behind to clog things up) be any better?
Some time ago I started to have problems with too many connections being
open so I lowered the limit to 300 connections. It was all good until
recently when even with 180 connections I am running out of memory on
vacuum... So the connections increase to 180 and the system still runs
properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the
problem is still these so I need to restart postgres.
How are you closing the connections?
If I don't restart postgres then the system will run out of memory on
queries at a point...
Another important thing is that during vacuum at 1am nothing else is going
on that server so all the connections are idle.
Truly idle, or idle in transaction, or not even that? If the "abandoned"
connections have completed whatever long running thing caused them to be
abandoned, then they should no longer exist. What are the largest
processes according to top?
You seem to have a series of unfortunate events here, but I think you are
tackling them from the wrong end. What are these slow queries that take up
a lot of memory, and why can't they be improved? Given that you have slow
queries that take a lot of memory, why does your app respond to this by
launching a Denial of Service attack against its own database, and why do
you let it do that?
Cheers,
Jeff
Hi Jeff,
On Tuesday, May 14, 2013, Ioana Danes wrote:
Hi all,
I have a production database that sometimes runs out of memory at nightly vacuum.
The application runs typically with around 40 postgres connections but there are times when the connections increase because of some queries going on.
You should consider not allowing that to happen. If 40 connections is the sweet spot for your hardware, then you can't change reality simply by changing the connection pool setting.
Yes I agree
The reason is that the operations are slow, the terminals time out and try to reconnect using new connections.
So, "The beatings will continue until morale improves". You should consider not allowing that to happen, either. If the original operation is slow, why would trying it again on a new connection (while leaving the original one behind to clog things up) be any better?
Yes I agree it is just hard to distinguish between a real and false request for a new connections (high sales or bad queries).
Some time ago I started to have problems with too many connections being open so I lowered the limit to 300 connections. It was all good until recently when even with 180 connections I am running out of memory on vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.
How are you closing the connections?
I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are idle.
Truly idle, or idle in transaction, or not even that? If the "abandoned" connections have completed whatever long running thing caused them to be abandoned, then they should no longer exist. What are the largest processes according to top?
They were IDLE (not IDLE IN TRANSACTION)
You seem to have a series of unfortunate events here, but I think you are tackling them from the wrong end. What are these slow queries that take up a lot of memory, and why can't they be improved? Given that you have slow queries that take a lot of memory, why does your app respond to this by launching a Denial of Service attack against its own database, and why do you let it do that?
I never said that I am ignoring the bad queries or the bad processes. Sometimes it just takes time to catch and fix. It is just the reality....
Not all the time the pb is very obvious and easy to reproduce.
Thank you very much for your response,
ioana
On Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
Hi Jeff,
On Tuesday, May 14, 2013, Ioana Danes wrote:
The fix is to restart postgres ... If I only close the connections the
problem is still these so I need to restart postgres.How are you closing the connections?
I restart the application server. The problem is that the max_idle
connections was set to 1000 on jdbc connection so once the spike happened
the app would run with 300 connections and 250 of them or so IDLE for most
of the time. I am fixing that
Hi Ionana, thanks for the responses. Does restarting the app server
successfully cause all of those connections to terminate? If so, and yet
you still have memory problems, then there is still the mystery of where
the memory is going.
Cheers,
Jeff
Hi Jeff,
Yes stop/start of the application server does close all the connections to the database.
Lately I did restart postgres too everytime that happened. It did happen in the past, last year sometime when I tried just to close the app and it was not enough. I might mix up different scenarios thought because I did have another issue when the by mistake the max connections were set to 1000 and run out of memory for good reason. So it might have been happened in that case not now.
I will keep you updated.
Thank you,
Ioana
________________________________
From: Jeff Janes <jeff.janes@gmail.com>
To: Ioana Danes <ioanasoftware@yahoo.ca>
Cc: PostgreSQL General <pgsql-general@postgresql.org>
Sent: Thursday, May 16, 2013 9:56:07 AM
Subject: Re: [GENERAL] Running out of memory at vacuum
On Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
Hi Jeff,
On Tuesday, May 14, 2013, Ioana Danes wrote:
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.
How are you closing the connections?
I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that
Hi Ionana, thanks for the responses. Does restarting the app server successfully cause all of those connections to terminate? If so, and yet you still have memory problems, then there is still the mystery of where the memory is going.
Cheers,
Jeff
On 05/16/2013 07:13 AM, Ioana Danes wrote:
Hi Jeff,
Yes stop/start of the application server does close all the
connections to the database.
Lately I did restart postgres too everytime that happened. It did
happen in the past, last year sometime when I tried just to close the
app and it was not enough. I might mix up different scenarios thought
because I did have another issue when the by mistake the max
connections were set to 1000 and run out of memory for good reason. So
it might have been happened in that case not now.I will keep you updated.
Thank you,
Ioana
------------------------------------------------------------------------
*From:* Jeff Janes <jeff.janes@gmail.com>
*To:* Ioana Danes <ioanasoftware@yahoo.ca>
*Cc:* PostgreSQL General <pgsql-general@postgresql.org>
*Sent:* Thursday, May 16, 2013 9:56:07 AM
*Subject:* Re: [GENERAL] Running out of memory at vacuumOn Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca
<mailto:ioanasoftware@yahoo.ca>> wrote:Hi Jeff,
On Tuesday, May 14, 2013, Ioana Danes wrote:
The fix is to restart postgres ... If I only close the
connections the problem is still these so I need to restart
postgres.How are you closing the connections?
I restart the application server. The problem is that the max_idle
connections was set to 1000 on jdbc connection so once the spike
happened the app would run with 300 connections and 250 of them or
so IDLE for most of the time. I am fixing thatHi Ionana, thanks for the responses. Does restarting the app server
successfully cause all of those connections to terminate? If so, and
yet you still have memory problems, then there is still the mystery of
where the memory is going.Cheers,
Jeff
From http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
work_mem maintainance_work_mem
<http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM>
If you do a lot of complex sorts, and have a lot of memory, then
increasing the work_mem parameter allows PostgreSQL to do larger
in-memory sorts which, unsurprisingly, will be faster than disk-based
equivalents.
This size is applied to each and every sort done by each user, and
complex queries can use multiple working memory sort buffers. Set it to
50MB, and have 30 users submitting queries, and you are soon using 1.5GB
of real memory. Furthermore, if a query involves doing merge sorts of 8
tables, that requires 8 times work_mem. You need to consider what you
set max_connections to in order to size this parameter correctly. This
is a setting where data warehouse systems, where users are submitting
very large queries, can readily make use of many gigabytes of memory.
maintenance_work_mem is used for operations like vacuum. Using extremely
large values here doesn't help very much, and *because you essentially
need to reserve that memory* for when vacuum kicks in, takes it away
from more useful purposes. Something in the 256MB range has anecdotally
been a reasonably large setting here.