Out of memory on SELECT in 8.3.5
We've been having persistent out-of-memory errors occur in our production
8.3 deployment, which is now running 8.3.5. I'm not sure the query here is
the cause of the problem, but this is our most-recent example which
triggered an out-of-memory error for us.
Perhaps our configuration needs tweaking somewhere...
shared_buffers = 4000MB
#temp_buffers = 8MB
work_mem = 128MB
maintenance_work_mem = 256MB
#max_stack_depth = 2MB
I was wondering if somebody could spot anything odd looking in the query
plan or the memory parameters dumped to the log that might be causing
this?
explain SELECT
s.sale_date,
s.vin,
s.dealer_region as sale_region,
s.dealer_area as sale_area,
s.dealer_code as sale_dealer,
d.name as sale_dealer_name,
s.model as sale_model,
s.sale_type,
ml.lead_id,
l.created,
lrd.reporting_date as lead_date,
ml.lead_pos as lead_order,
ml.lost_sale,
ml.range,
lrd.dealer_region as lead_region,
lrd.dealer_area as lead_area,
lrd.dealer_code as lead_dealer,
lrd.model as lead_model,
l.source as lead_source,
lrd.oem_type as lead_type,
lrd.oem_subtype as lead_subtype,
lrd.unique_all as lead_unique_all,
lrd.unique_region as lead_unique_region,
lrd.unique_area as lead_unique_area,
lrd.unique_dealer as lead_unique_dealer,
lrd.customer_fname,
lrd.customer_lname,
lrdm.item_value as lead_tracking_code,
address_street.street as owner_street,
address.city as owner_city,
address.state as owner_state,
address.postal_code as owner_postal_code,
address.country as owner_country,
email.list_value as owner_email,
phone.list_value as owner_phone
FROM mb_sale s
LEFT OUTER JOIN dealer d on d.dealer_code = s.dealer_code
LEFT OUTER JOIN mb_lead ml on ml.sale_id = s.id
LEFT OUTER JOIN lead_reporting_data lrd ON lrd.lead_id = ml.lead_id
LEFT OUTER JOIN lead_reporting_meta lrdm
ON lrdm.lead_id = ml.lead_id
AND lrdm.item_key = '[ALS:prospectid]TrackingCode'
AND lrdm.pos=1
LEFT OUTER JOIN lead l ON l.id = ml.lead_id
LEFT OUTER JOIN lead_reporting_address address ON address.lead_id =
ml.lead_id
LEFT OUTER JOIN lead_reporting_street address_street
ON address_street.address_id = address.id AND address_street.pos = 0
LEFT OUTER JOIN lead_reporting_list_data phone
ON phone.lead_id = ml.lead_id AND phone.list_type = 'p' AND phone.pos = 0
LEFT OUTER JOIN lead_reporting_list_data email
ON email.lead_id = ml.lead_id AND email.list_type = 'e' AND email.pos = 0
WHERE
((s.sale_type = 'd' and ml.lost_sale = FALSE) OR (s.sale_type = 'c' and
ml.lead_pos = 0) OR (s.sale_type = '0'))
AND s.sale_date BETWEEN date('2009-01-01') AND date('2009-01-31')
ORDER BY s.sale_date, s.vin
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=2297525.76..2297626.60 rows=40336 width=234)
Sort Key: s.sale_date, s.vin
-> Hash Left Join (cost=1450193.63..2294440.10 rows=40336 width=234)
Hash Cond: (ml.lead_id = phone.lead_id)
-> Hash Left Join (cost=1341176.18..2140101.73 rows=40336
width=219)
Hash Cond: (ml.lead_id = email.lead_id)
-> Nested Loop Left Join (cost=1237098.64..2000317.19
rows=40336 width=204)
-> Nested Loop Left Join
(cost=1237098.64..1806747.26 rows=40336 width=141)
-> Nested Loop Left Join
(cost=1237098.64..1633400.88 rows=40336
width=133)
-> Nested Loop Left Join
(cost=1237098.64..1413899.40 rows=40336
width=118)
-> Merge Left Join
(cost=1237098.64..1238143.69
rows=40336 width=89)
Merge Cond: (ml.lead_id =
lrdm.lead_id)
-> Sort
(cost=46384.08..46484.92
rows=40336 width=78)
Sort Key: ml.lead_id
-> Hash Left Join
(cost=9430.33..43298.42
rows=40336 width=78)
Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
-> Hash Left
Join
(cost=9094.04..42407.50
rows=40336
width=60)
Hash Cond:
(s.id =
ml.sale_id)
Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
-> Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
Index
Cond:
((sale_date
=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
-> Hash
(cost=5577.35..5577.35
rows=281335
width=26)
->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
-> Hash
(cost=321.13..321.13
rows=1213
width=23)
-> Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
-> Sort
(cost=1190714.57..1191036.34
rows=128709 width=19)
Sort Key: lrdm.lead_id
-> Bitmap Heap Scan on
lead_reporting_meta
lrdm
(cost=118675.76..1179791.19
rows=128709 width=19)
Recheck Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
Filter: (pos = 1)
-> Bitmap Index
Scan on
lead_reporting_meta_item_key_idx
(cost=0.00..118643.58
rows=1476580
width=0)
Index Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
-> Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.34 rows=1 width=37)
Index Cond: (address.lead_id
= ml.lead_id)
-> Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23)
Index Cond: (l.id = ml.lead_id)
-> Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24)
Index Cond: (address_street.address_id =
address.id)
Filter: (address_street.pos = 0)
-> Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd (cost=0.00..4.79 rows=1
width=71)
Index Cond: (lrd.lead_id = ml.lead_id)
-> Hash (cost=85837.99..85837.99 rows=1459164 width=23)
-> Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23)
Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
-> Hash (cost=85837.99..85837.99 rows=1854357 width=23)
-> Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23)
Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
(47 rows)
TopMemoryContext: 154528 total in 18 blocks; 5240 free (20 chunks); 149288
used
TopTransactionContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568
used
PL/PgSQL function context: 8192 total in 1 blocks; 5512 free (4 chunks);
2680 used
PL/PgSQL function context: 57344 total in 3 blocks; 42720 free (11
chunks); 14624 used
PL/PgSQL function context: 8192 total in 1 blocks; 4648 free (7 chunks);
3544 used
Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks);
6512 used
PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 chunks);
20776 used
Record information cache: 24576 total in 2 blocks; 15984 free (5
chunks); 8592 used
CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
Prepared Queries: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
smgr relation table: 24576 total in 2 blocks; 3584 free (4 chunks);
20992 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1021952 total in 126 blocks; 11008 free (181
chunks); 1010944 used
ExecutorState: 4972936 total in 6 blocks; 38456 free (11 chunks);
4934480 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 148889648 total in 28 blocks; 2532544 free (5
chunks); 146357104 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 144695344 total in 27 blocks; 828368 free (12
chunks); 143866976 used
TIDBitmap: 50356272 total in 16 blocks; 7517392 free (41 chunks);
42838880 used
TupleSort: 142649352 total in 25 blocks; 92237584 free (1339345
chunks); 50411768 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 131072 total in 4 blocks; 20064 free (3
chunks); 111008 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: 25419824 total in 13 blocks; 2650448 free (5
chunks); 22769376 used
TupleSort: 9953328 total in 11 blocks; 1277840 free (9 chunks);
8675488 used
TupleSort: 1138736 total in 8 blocks; 768 free (8 chunks); 1137968
used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568
used
Relcache by OID: 24576 total in 2 blocks; 10752 free (3 chunks); 13824 used
CacheMemoryContext: 1341680 total in 21 blocks; 174632 free (83 chunks);
1167048 used
unnamed prepared statement: 516096 total in 6 blocks; 231360 free (2
chunks); 284736 used
CachedPlan: 3072 total in 2 blocks; 1160 free (0 chunks); 1912 used
pg_statistic_relid_att_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
locks_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
lead_reporting_list_data_value_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_list_data_value_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_list_data_subtype_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_list_data_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_list_data_lead_id_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_list_data_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_street_street_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_street_address_id_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
lead_reporting_street_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_address_postal_code_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_address_state_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_address_city_lower_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_address_lead_id_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
lead_reporting_address_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
mb_lead_sale_id_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
mb_lead_lead_id_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
mb_lead_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
mb_sale_dealer_region_area_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
mb_sale_dealer_code_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
mb_sale_sale_date_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
mb_sale_type_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
mb_sale_model_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
mb_sale_dealer_region_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
mb_sale_dealer_area_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
mb_sale_vin_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
mb_sale_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
CachedPlan: 3072 total in 2 blocks; 648 free (0 chunks); 2424 used
pg_toast_17933_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
config_xml_name_key: 2048 total in 1 blocks; 704 free (0 chunks); 1344
used
config_xml_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlanSource: 1024 total in 1 blocks; 24 free (0 chunks); 1000 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 7168 total in 3 blocks; 3888 free (1 chunks); 3280 used
CachedPlan: 7168 total in 3 blocks; 3920 free (0 chunks); 3248 used
CachedPlanSource: 3072 total in 2 blocks; 888 free (0 chunks); 2184 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 1456 free (1 chunks); 1616 used
CachedPlanSource: 3072 total in 2 blocks; 1664 free (0 chunks); 1408 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 312 free (0 chunks); 2760 used
CachedPlanSource: 3072 total in 2 blocks; 1568 free (0 chunks); 1504 used
SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
cl_event_first_response_pkey: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 1728 free (0 chunks); 1344 used
CachedPlanSource: 3072 total in 2 blocks; 1536 free (0 chunks); 1536 used
SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
CachedPlan: 15360 total in 4 blocks; 7104 free (1 chunks); 8256 used
CachedPlanSource: 7168 total in 3 blocks; 2976 free (2 chunks); 4192 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 1568 free (0 chunks); 1504 used
CachedPlanSource: 3072 total in 2 blocks; 1352 free (0 chunks); 1720 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 3072 total in 2 blocks; 368 free (0 chunks); 2704 used
CachedPlanSource: 3072 total in 2 blocks; 40 free (0 chunks); 3032 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
cl_event_crm_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
cl_event_lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
cl_event_event_name_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
cl_event_crm_lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
cl_event_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
locality_data_postal_code_norm_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
locality_data_postal_code_key: 2048 total in 1 blocks; 392 free (0
chunks); 1656 used
locality_data_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
lead_reporting_meta_item_value_key_idx: 2048 total in 1 blocks; 608
free (0 chunks); 1440 used
lead_reporting_meta_item_key_idx: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
lead_reporting_meta_item_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_meta_lead_id_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_meta_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlan: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
CachedPlanSource: 1024 total in 1 blocks; 56 free (0 chunks); 968 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlanSource: 3072 total in 2 blocks; 1248 free (1 chunks); 1824 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
pg_toast_18404_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
CachedPlan: 3072 total in 2 blocks; 936 free (0 chunks); 2136 used
CachedPlanSource: 3072 total in 2 blocks; 1424 free (0 chunks); 1648 used
SPI Plan: 1024 total in 1 blocks; 832 free (0 chunks); 192 used
CachedPlanSource: 3072 total in 2 blocks; 360 free (0 chunks); 2712 used
SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
CachedPlan: 7168 total in 3 blocks; 3888 free (1 chunks); 3280 used
CachedPlanSource: 3072 total in 2 blocks; 360 free (0 chunks); 2712 used
SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used
CachedPlan: 7168 total in 3 blocks; 3704 free (1 chunks); 3464 used
CachedPlanSource: 3072 total in 2 blocks; 896 free (0 chunks); 2176 used
SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
region_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_toast_18005_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
request_state_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
lead_id_idx: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
lead_billing_contention_pkey: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
lead_reporting_data_delivery_meth_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_typeflags_idx: 2048 total in 1 blocks; 600 free (0
chunks); 1448 used
lead_reporting_data_unique_dealer_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_unique_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_data_unique_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_unique_all_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_data_oem_subtype_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_data_oem_type_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_data_dealer_area_idx: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
lead_reporting_data_customer_lname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_customer_fname_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_customer_postal_code_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
lead_reporting_data_dealer_region_area_idx: 2048 total in 1 blocks;
752 free (0 chunks); 1296 used
lead_reporting_data_dealer_region_idx: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
lead_reporting_data_dealer_code_idx: 2048 total in 1 blocks; 704 free
(0 chunks); 1344 used
lead_reporting_data_reporting_date_idx: 2048 total in 1 blocks; 704
free (0 chunks); 1344 used
lead_reporting_data_model_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
lead_reporting_data_pkey: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
lead_reporting: 23552 total in 5 blocks; 7240 free (0 chunks); 16312 used
lead_processing_state_step_idx: 2048 total in 1 blocks; 656 free (0
chunks); 1392 used
reporting_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
reporting_last_processed_date_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
reporting_date_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
lead_created_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
lead_processing_step_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
lead_destination_idx: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
lead_modified_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
lead_source_idx: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
lead_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
metadata_key_value_idx: 2048 total in 1 blocks; 656 free (0 chunks);
1392 used
metadata_pkey: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
dealer_metadata_pkey: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
xml_schema_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
ehcache_expiration_idx: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
ehcache_pkey: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
external_system_user_name_key: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
external_system_name_key: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
external_system_pkey: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
dealer_external_system_id_idx: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
dealer_dealer_code_key: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
dealer_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
xslt_style_sheet_pkey: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
CachedPlan: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
CachedPlanSource: 1024 total in 1 blocks; 664 free (1 chunks); 360 used
pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
pg_language_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1576 free (3
chunks); 1496 used
pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_constraint_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks);
1448 used
pg_namespace_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_type_typname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks);
1472 used
pg_proc_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
MdSmgr: 24576 total in 2 blocks; 16192 free (0 chunks); 8384 used
LOCALLOCK hash: 24576 total in 2 blocks; 11872 free (5 chunks); 12704 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
2009-02-08 20:11:27 CST lms_nna ERROR: out of memory
2009-02-08 20:11:27 CST lms_nna DETAIL: Failed on request of size 243.
* Matt Magoffin (postgresql.org@msqr.us) wrote:
We've been having persistent out-of-memory errors occur in our production
8.3 deployment, which is now running 8.3.5. I'm not sure the query here is
the cause of the problem, but this is our most-recent example which
triggered an out-of-memory error for us.
erm.. How much memory do you have in the system?
Perhaps our configuration needs tweaking somewhere...
shared_buffers = 4000MB
I hope you've got a fair bit more than 4G of memory if you're going to
use 4G for shared buffers... Once that memory is dedicated to shared
buffers it's not going to be available for other usage.
#temp_buffers = 8MB
work_mem = 128MB
maintenance_work_mem = 256MB
#max_stack_depth = 2MB
Those are bumped up from the defaults, but don't look too bad to me. Do
realize that 'work_mem' is the amount PG is allowed to use for each
action and that complex queries can end up using many times that amount
(but I doubt that's the real issue here..).
Thanks,
Stephen
erm.. How much memory do you have in the system?
This system has 16GB of RAM, and Postgres is basically the only service
running on the box.
shared_buffers = 4000MB
I hope you've got a fair bit more than 4G of memory if you're going to
use 4G for shared buffers... Once that memory is dedicated to shared
buffers it's not going to be available for other usage.
Yep, we've got 16GB to work with here. I should have also mentioned the
architecture in my original post, sorry. SELECT version() returns this:
PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070626 (Red Hat 4.1.2-14)
Regards,
Matt
* Matt Magoffin (postgresql.org@msqr.us) wrote:
Yep, we've got 16GB to work with here. I should have also mentioned the
architecture in my original post, sorry. SELECT version() returns this:PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070626 (Red Hat 4.1.2-14)
Does the result from 'free' look reasonable on this box? How many PG
processes are running, generally? Do you see the PG process running
this query growing to fill the available memory? That query really
didn't look like it was chewing up all that much memory to me.. Have
you tried reducing your work_mem setting to see if that makes the errors
go away? It might cause a different plan which might be less efficient
for some queries, not sure if you have anything which would be
dramatically affected by that.. Do the row counts in the plan match up
with what you'd expect? Can you provide output from 'explain analyze'
on a successful run?
Thanks,
Stephen
PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070626 (Red Hat 4.1.2-14)Does the result from 'free' look reasonable on this box?
I think so:
total used free shared buffers cached
Mem: 16432296 16273964 158332 0 173536 14321340
-/+ buffers/cache: 1779088 14653208
Swap: 2096440 560 2095880
How many PG processes are running, generally?
I see about 90 currently, of which 24 are Slony connections and the rest
are client connections.
Do you see the PG process running this query growing to fill the
available memory? That query really
didn't look like it was chewing up all that much memory to me..
Just running top, it does appear to chew through a fair amount of memory.
Here's a snapshot from top of the postgres processing running this query
from just before it ran out of memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4486 postgres 18 0 4576m 3.6g 3.3g R 90 23.1 0:34.23 postgres:
postgres lms_nna [local] EXPLAIN
These values did start out low, for example the RES memory started in the
130MB range, then climbed to the 3.6GB you see here.
Have
you tried reducing your work_mem setting to see if that makes the errors
go away? It might cause a different plan which might be less efficient
for some queries, not sure if you have anything which would be
dramatically affected by that..Do the row counts in the plan match up
with what you'd expect? Can you provide output from 'explain analyze'
on a successful run?
I set the work_mem to 2MB, and the query does actually complete (explain
analyze output below), so does this mean that the query simply uses too
much memory for sorting/joining, and we'd have to either allocate enough
work_mem to allow the query to complete, or a smaller work_mem as shown
here to make the query use slower disk-based sorting? The row counts are
matching what we'd expect from this query.
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=2345301.47..2345402.31 rows=40336 width=234) (actual
time=203429.526..203599.840 rows=49317 loops=1)
Sort Key: s.sale_date, s.vin
Sort Method: external merge Disk: 4776kB
-> Hash Left Join (cost=1469244.28..2337665.81 rows=40336 width=234)
(actual time=162362.791..202883.728 rows=49317 loops=1)
Hash Cond: (ml.lead_id = phone.lead_id)
-> Hash Left Join (cost=1349360.82..2159151.44 rows=40336
width=219) (actual time=155499.666..194941.423 rows=49317
loops=1)
Hash Cond: (ml.lead_id = email.lead_id)
-> Nested Loop Left Join (cost=1236733.28..1999980.90
rows=40336 width=204) (actual time=141890.479..186344.717
rows=49317 loops=1)
-> Nested Loop Left Join
(cost=1236733.28..1806406.02 rows=40336 width=141)
(actual time=141890.419..166782.070 rows=49317
loops=1)
-> Nested Loop Left Join
(cost=1236733.28..1586899.03 rows=40336
width=126) (actual time=141890.368..166124.253
rows=49317 loops=1)
-> Nested Loop Left Join
(cost=1236733.28..1413543.83 rows=40336
width=118) (actual
time=141890.281..156284.612 rows=49317
loops=1)
-> Merge Left Join
(cost=1236733.28..1237778.33
rows=40336 width=89) (actual
time=141890.184..143717.900
rows=49317 loops=1)
Merge Cond: (ml.lead_id =
lrdm.lead_id)
-> Sort
(cost=43356.21..43457.05
rows=40336 width=78) (actual
time=1722.385..1794.668
rows=49317 loops=1)
Sort Key: ml.lead_id
Sort Method: external
merge Disk: 4152kB
-> Hash Left Join
(cost=11354.33..38475.05
rows=40336 width=78)
(actual
time=919.319..1578.556
rows=49317 loops=1)
Hash Cond:
((s.dealer_code)::text
=
(d.dealer_code)::text)
-> Hash Left
Join
(cost=11018.04..37584.13
rows=40336
width=60) (actual
time=914.936..1445.926
rows=49317
loops=1)
Hash Cond:
(s.id =
ml.sale_id)
Filter:
(((s.sale_type
=
'd'::bpchar)
AND (NOT
ml.lost_sale))
OR
((s.sale_type
=
'c'::bpchar)
AND
(ml.lead_pos
= 0)) OR
(s.sale_type
=
'0'::bpchar))
-> Index
Scan using
mb_sale_sale_date_idx
on mb_sale
s
(cost=0.00..14027.94
rows=43804
width=50)
(actual
time=0.149..126.835
rows=49152
loops=1)
Index
Cond:
((sale_date
=
'2009-01-01'::date)
AND
(sale_date
<=
'2009-01-31'::date))
-> Hash
(cost=5577.35..5577.35
rows=281335
width=26)
(actual
time=914.089..914.089
rows=281335
loops=1)
->
Seq
Scan
on
mb_lead
ml
(cost=0.00..5577.35
rows=281335
width=26)
(actual
time=0.026..394.384
rows=281335
loops=1)
-> Hash
(cost=321.13..321.13
rows=1213
width=23) (actual
time=4.343..4.343
rows=1213
loops=1)
-> Seq
Scan on
dealer d
(cost=0.00..321.13
rows=1213
width=23)
(actual
time=0.013..2.451
rows=1213
loops=1)
-> Materialize
(cost=1193377.08..1194985.94
rows=128709 width=19) (actual
time=135323.691..140326.709
rows=1442527 loops=1)
-> Sort
(cost=1193377.08..1193698.85
rows=128709 width=19)
(actual
time=135323.681..137742.900
rows=1442488 loops=1)
Sort Key:
lrdm.lead_id
Sort Method:
external merge
Disk: 50312kB
-> Bitmap Heap
Scan on
lead_reporting_meta
lrdm
(cost=118695.77..1179811.20
rows=128709
width=19) (actual
time=90238.108..126737.875
rows=1460904
loops=1)
Recheck
Cond:
(item_key =
'[ALS:prospectid]TrackingCode'::text)
Filter:
(pos = 1)
-> Bitmap
Index Scan
on
lead_reporting_meta_item_key_idx
(cost=0.00..118663.60
rows=1476580
width=0)
(actual
time=90223.734..90223.734
rows=1476483
loops=1)
Index
Cond:
(item_key
=
'[ALS:prospectid]TrackingCode'::text)
-> Index Scan using
lead_reporting_address_lead_id_idx
on lead_reporting_address address
(cost=0.00..4.35 rows=1 width=37)
(actual time=0.251..0.252 rows=0
loops=49317)
Index Cond: (address.lead_id
= ml.lead_id)
-> Index Scan using
lead_reporting_street_address_id_idx on
lead_reporting_street address_street
(cost=0.00..4.29 rows=1 width=24) (actual
time=0.196..0.197 rows=0 loops=49317)
Index Cond:
(address_street.address_id =
address.id)
Filter: (address_street.pos = 0)
-> Index Scan using lead_pkey on lead l
(cost=0.00..5.43 rows=1 width=23) (actual
time=0.010..0.010 rows=0 loops=49317)
Index Cond: (l.id = ml.lead_id)
-> Index Scan using lead_reporting_data_pkey on
lead_reporting_data lrd (cost=0.00..4.79 rows=1
width=71) (actual time=0.393..0.393 rows=0
loops=49317)
Index Cond: (lrd.lead_id = ml.lead_id)
-> Hash (cost=85837.99..85837.99 rows=1459164 width=23)
(actual time=7607.947..7607.947 rows=1521294 loops=1)
-> Seq Scan on lead_reporting_list_data email
(cost=0.00..85837.99 rows=1459164 width=23) (actual
time=0.030..5063.069 rows=1521294 loops=1)
Filter: ((list_type = 'e'::bpchar) AND (pos = 0))
-> Hash (cost=85837.99..85837.99 rows=1854357 width=23) (actual
time=6863.050..6863.050 rows=1803678 loops=1)
-> Seq Scan on lead_reporting_list_data phone
(cost=0.00..85837.99 rows=1854357 width=23) (actual
time=0.020..3920.096 rows=1803678 loops=1)
Filter: ((list_type = 'p'::bpchar) AND (pos = 0))
Total runtime: 203657.960 ms
(52 rows)
* Matt Magoffin (postgresql.org@msqr.us) wrote:
Does the result from 'free' look reasonable on this box?
I think so:
total used free shared buffers cached
Mem: 16432296 16273964 158332 0 173536 14321340
-/+ buffers/cache: 1779088 14653208
Swap: 2096440 560 2095880
That certainly looks fine.. And you've got 14G or so which should be
available for this query. Was this near the time the query was running?
Could you give us what 'free' returns when the query is close to the
out-of-memory error? I'd expect the 2nd row under 'free' to be getting
low for the allocation to fail.
Just running top, it does appear to chew through a fair amount of memory.
Here's a snapshot from top of the postgres processing running this query
from just before it ran out of memory:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4486 postgres 18 0 4576m 3.6g 3.3g R 90 23.1 0:34.23 postgres:
postgres lms_nna [local] EXPLAINThese values did start out low, for example the RES memory started in the
130MB range, then climbed to the 3.6GB you see here.
Uhh.. I saw that your system was 64-bit, but is your PG process
compiled as 64bit? Maybe you're hitting an artificial 32-bit limit,
which isn't exactly helped by your shared_buffers being set up so high
to begin with? Run 'file' on your postgres binary, like so:
sfrost@snowman:/home/sfrost> file /usr/lib/postgresql/8.3/bin/postgres
/usr/lib/postgresql/8.3/bin/postgres: ELF 64-bit LSB executable, x86-64,
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux
2.6.8, stripped
(that's on Debian, your postgres binary will probably be somewhere
else).
I set the work_mem to 2MB, and the query does actually complete (explain
analyze output below), so does this mean that the query simply uses too
much memory for sorting/joining, and we'd have to either allocate enough
work_mem to allow the query to complete, or a smaller work_mem as shown
here to make the query use slower disk-based sorting? The row counts are
matching what we'd expect from this query.
Allocating more work_mem won't help in this situation. PG's
out-of-memory error is only ever caused by an actual allocation failure.
The work_mem, to PG, is more of a guideline than any kind of hard limit.
Based on the explain analyze, I continue to feel that your query isn't
actually all that big in terms of memory usage (the disk-based sorting
taking place was in the 50M range or so, from what I saw, and you should
have plenty of memory for such a query..).
Check your binary.. I suspect that's where your issue is. Now, I don't
recall offhand if you can just recompile-in-place or if you need to do a
dump/restore, but I'd definitely do a backup just in case and keep it
handy (or maybe use it just to be safe) and see if PG complains when
it's started up.
Thanks,
Stephen
* Matt Magoffin (postgresql.org@msqr.us) wrote:
Just running top, it does appear to chew through a fair amount of memory.
Here's a snapshot from top of the postgres processing running this query
from just before it ran out of memory:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4486 postgres 18 0 4576m 3.6g 3.3g R 90 23.1 0:34.23 postgres:
postgres lms_nna [local] EXPLAINThese values did start out low, for example the RES memory started in the
130MB range, then climbed to the 3.6GB you see here.
That is almost certainly meaningless; it just reflects the process
touching a larger and larger fraction of shared buffers over its
existence. The number to pay attention to is the non-shared memory size
(VIRT - SHR is probably the right number here).
Stephen Frost <sfrost@snowman.net> writes:
Uhh.. I saw that your system was 64-bit, but is your PG process
compiled as 64bit? Maybe you're hitting an artificial 32-bit limit,
which isn't exactly helped by your shared_buffers being set up so high
to begin with? Run 'file' on your postgres binary, like so:
I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with. However, it's possible that the
postmaster's been started under a ulimit setting that constrains each
backend to just a few hundred meg of per-process memory.
regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
Uhh.. I saw that your system was 64-bit, but is your PG process
compiled as 64bit? Maybe you're hitting an artificial 32-bit limit,
which isn't exactly helped by your shared_buffers being set up so high
to begin with? Run 'file' on your postgres binary, like so:I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with. However, it's possible that the
postmaster's been started under a ulimit setting that constrains each
backend to just a few hundred meg of per-process memory.
I'm not so sure.. He has it as '4000MB', which would leave 96M free,
which doesn't seem that far from where his query is ending up at based
on the memory info he provided and the size of the sorts being done.
The ulimit is certainly something else to check tho.
Thanks,
Stephen
total used free shared buffers
cached
Mem: 16432296 16273964 158332 0 173536
14321340
-/+ buffers/cache: 1779088 14653208
Swap: 2096440 560 2095880That certainly looks fine.. And you've got 14G or so which should be
available for this query. Was this near the time the query was running?
Could you give us what 'free' returns when the query is close to the
out-of-memory error? I'd expect the 2nd row under 'free' to be getting
low for the allocation to fail.
It was near the time... here's free just before executing the query:
total used free shared buffers cached
Mem: 16432296 16342056 90240 0 116868 14561200
-/+ buffers/cache: 1663988 14768308
Swap: 2096440 448 2095992
And then here is free close to the time the query runs out of memory (but
while the query is still running):
total used free shared buffers cached
Mem: 16432296 16348260 84036 0 41344 14167384
-/+ buffers/cache: 2139532 14292764
Swap: 2096440 448 2095992
Uhh.. I saw that your system was 64-bit, but is your PG process
compiled as 64bit? Maybe you're hitting an artificial 32-bit limit,
which isn't exactly helped by your shared_buffers being set up so high
to begin with?
Yes, it's compiled as 64-bit:
/opt/lms/postgres-8.3_64/bin/postgres: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses
shared libs), for GNU/Linux 2.6.9, not stripped
We've had nagging memory-related issues with 8.3 that manifest themselves
like memory leaks... some posts I've made in the past have led to some
leaks getting fixed... but I've not been able to track down more specific
causes. It's just that over time Postgres seems to accumulate memory and
then we start to see out-of-memory errors like this one. Again, this query
could have nothing to do with the root cause, but this is just the example
I have to work with at the moment.
Regards,
Matt
I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with. However, it's possible that the
postmaster's been started under a ulimit setting that constrains each
backend to just a few hundred meg of per-process memory.
Here's the output of ulimit -a by the "postgres" user the database is
running under:
[postgres@170226-db7 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
max rt priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 139264
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I think this means it does not have an artificial memory limit imposed,
but is there a specific setting beyond these I could check do you think?
Regards,
Matt
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with.
I'm not so sure.. He has it as '4000MB', which would leave 96M free,
which doesn't seem that far from where his query is ending up at based
on the memory info he provided and the size of the sorts being done.
The ulimit is certainly something else to check tho.
Hmm ... a gig here, a gig there, pretty soon you're talking about real
memory? He's got several sorts and hashes that are each taking over
100MB according to the memory context dump, so it seems impossible that
it all fits into a strict 32-bit address space. There's surely no harm
in double-checking the executable's file type though.
regards, tom lane
Hmm ... a gig here, a gig there, pretty soon you're talking about real
memory? He's got several sorts and hashes that are each taking over
100MB according to the memory context dump, so it seems impossible that
it all fits into a strict 32-bit address space. There's surely no harm
in double-checking the executable's file type though.
I did reply to this in a different email in this thread, but just to be
safe, yes, I did verify Postgres is a 64-bit binary:
[postgres@170226-db7 ~]$ file /opt/lms/postgres-8.3_64/bin/postgres
/opt/lms/postgres-8.3_64/bin/postgres: ELF 64-bit LSB executable, AMD
x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses
shared libs), for GNU/Linux 2.6.9, not stripped
Regards,
Matt
"Matt Magoffin" <postgresql.org@msqr.us> writes:
Here's the output of ulimit -a by the "postgres" user the database is
running under:
...
I think this means it does not have an artificial memory limit imposed,
Agreed, that ulimit isn't reflecting any such limit, but is that really
the same environment the postmaster gets started in? I wouldn't trust
a system startup script to be launched in the same environment that a
login shell gets. You might try adding
ulimit -a >/tmp/something
to the startup script to find out for sure.
regards, tom lane
* Matt Magoffin (postgresql.org@msqr.us) wrote:
I think it must be compiled 64-bit, or he'd not be able to get
shared_buffers that high to start with. However, it's possible that the
postmaster's been started under a ulimit setting that constrains each
backend to just a few hundred meg of per-process memory.Here's the output of ulimit -a by the "postgres" user the database is
running under:
[...]
I think this means it does not have an artificial memory limit imposed,
but is there a specific setting beyond these I could check do you think?
How about cat /proc/<pid>/limits for the postmaster?
And maybe:
status
stat
maps
Though I'm kinda grasping at straws here, to be honest. I've had PG up
and running through >16G of memory at a time before.
Thanks,
Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Hmm ... a gig here, a gig there, pretty soon you're talking about real
memory? He's got several sorts and hashes that are each taking over
100MB according to the memory context dump, so it seems impossible that
it all fits into a strict 32-bit address space. There's surely no harm
in double-checking the executable's file type though.
Argh, it's late and I'm not thinking. Yeah, I had seen the 100M hashes
and just wasn't thinking clearly. Sorry for the noise.
Stephen
"Matt Magoffin" <postgresql.org@msqr.us> writes:
We've had nagging memory-related issues with 8.3 that manifest themselves
like memory leaks... some posts I've made in the past have led to some
leaks getting fixed... but I've not been able to track down more specific
causes. It's just that over time Postgres seems to accumulate memory and
then we start to see out-of-memory errors like this one. Again, this query
could have nothing to do with the root cause, but this is just the example
I have to work with at the moment.
Yeah, I remember we found a few xml-related leaks based on your reports.
However, there's not anything here to suggest that this query is
straining the capabilities of a 64-bit system with lots o RAM. It seems
certain you're hitting some artificial process-size limit, and the only
one I know about is ulimit.
I wasn't aware of /proc/<pid>/limits before, but now that I've heard
of it, checking that for the postmaster and/or a backend seems like
a great idea.
regards, tom lane
How about cat /proc/<pid>/limits for the postmaster?
And maybe:
status
stat
mapsThough I'm kinda grasping at straws here, to be honest. I've had PG up
and running through >16G of memory at a time before.
There is no /prod/<pid>/limits file, but here are
status:
Name: postmaster
State: S (sleeping)
SleepAVG: 95%
Tgid: 4698
Pid: 4698
PPid: 1
TracerPid: 0
Uid: 26 26 26 26
Gid: 26 26 26 26
FDSize: 1024
Groups: 26
VmPeak: 4301216 kB
VmSize: 4301204 kB
VmLck: 0 kB
VmHWM: 85980 kB
VmRSS: 85980 kB
VmData: 940 kB
VmStk: 84 kB
VmExe: 4480 kB
VmLib: 3512 kB
VmPTE: 284 kB
StaBrk: 00ade000 kB
Brk: 0c3db000 kB
StaStk: 7fff21fecf40 kB
Threads: 1
SigQ: 0/139264
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001303000
SigCgt: 0000000000014a07
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed: 7fffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed: 00000000,00000001
stat:
4698 (postmaster) S 1 4686 4686 0 -1 4202496 27092033 7777059343 31 2809
6684 47834 13037706 1828097 15 0 1 0 9296 4404432896 21495
18446744073709551615 4194304 8778268 140733763735360 18446744073709551615
232627404115 0 0 19935232 84487 0 0 0 17 2 0 0 69
maps:
00400000-00860000 r-xp 00000000 08:03 1120168
/opt/lms/64/postgres/8.3/bin/postgres
00a5f000-00a6b000 rw-p 0045f000 08:03 1120168
/opt/lms/64/postgres/8.3/bin/postgres
00a6b000-00ade000 rw-p 00a6b000 00:00 0
0c39c000-0c3db000 rw-p 0c39c000 00:00 0
3012200000-3012234000 r-xp 00000000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012234000-3012433000 ---p 00034000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012433000-3012435000 rw-p 00033000 08:07 5237058
/usr/lib64/libxslt.so.1.1.17
3012a00000-3012a05000 r-xp 00000000 08:07 457029
/lib64/libcrypt-2.5.so
3012a05000-3012c04000 ---p 00005000 08:07 457029
/lib64/libcrypt-2.5.so
3012c04000-3012c05000 r--p 00004000 08:07 457029
/lib64/libcrypt-2.5.so
3012c05000-3012c06000 rw-p 00005000 08:07 457029
/lib64/libcrypt-2.5.so
3012c06000-3012c34000 rw-p 3012c06000 00:00 0
3629600000-362961a000 r-xp 00000000 08:07 457227
/lib64/ld-2.5.so
3629819000-362981a000 r--p 00019000 08:07 457227
/lib64/ld-2.5.so
362981a000-362981b000 rw-p 0001a000 08:07 457227
/lib64/ld-2.5.so
3629a00000-3629b46000 r-xp 00000000 08:07 457228
/lib64/libc-2.5.so
3629b46000-3629d46000 ---p 00146000 08:07 457228
/lib64/libc-2.5.so
3629d46000-3629d4a000 r--p 00146000 08:07 457228
/lib64/libc-2.5.so
3629d4a000-3629d4b000 rw-p 0014a000 08:07 457228
/lib64/libc-2.5.so
3629d4b000-3629d50000 rw-p 3629d4b000 00:00 0
3629e00000-3629e02000 r-xp 00000000 08:07 457229
/lib64/libdl-2.5.so
3629e02000-362a002000 ---p 00002000 08:07 457229
/lib64/libdl-2.5.so
362a002000-362a003000 r--p 00002000 08:07 457229
/lib64/libdl-2.5.so
362a003000-362a004000 rw-p 00003000 08:07 457229
/lib64/libdl-2.5.so
362aa00000-362aa82000 r-xp 00000000 08:07 457235
/lib64/libm-2.5.so
362aa82000-362ac81000 ---p 00082000 08:07 457235
/lib64/libm-2.5.so
362ac81000-362ac82000 r--p 00081000 08:07 457235
/lib64/libm-2.5.so
362ac82000-362ac83000 rw-p 00082000 08:07 457235
/lib64/libm-2.5.so
362b600000-362b614000 r-xp 00000000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362b614000-362b813000 ---p 00014000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362b813000-362b814000 rw-p 00013000 08:07 5231431
/usr/lib64/libz.so.1.2.3
362ba00000-362bb33000 r-xp 00000000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bb33000-362bd33000 ---p 00133000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bd33000-362bd3c000 rw-p 00133000 08:07 5235374
/usr/lib64/libxml2.so.2.6.26
362bd3c000-362bd3d000 rw-p 362bd3c000 00:00 0
2aaaaaaab000-2aaaaaaac000 rw-p 2aaaaaaab000 00:00 0
2aaaaaabd000-2aaaaaac1000 rw-p 2aaaaaabd000 00:00 0
2aaaaaac1000-2aaaaaacb000 r-xp 00000000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaacb000-2aaaaacca000 ---p 0000a000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaacca000-2aaaaaccb000 r--p 00009000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaccb000-2aaaaaccc000 rw-p 0000a000 08:07 456988
/lib64/libnss_files-2.5.so
2aaaaaccc000-2aaaae299000 r--p 00000000 08:07 5229018
/usr/lib/locale/locale-archive
2aaaae299000-2aabafc41000 rw-s 00000000 00:08 0
/SYSV0063da81 (deleted)
7fff21fda000-7fff21fef000 rw-p 7fff21fda000 00:00 0
[stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0
[vdso]
Hope this helps,
Matt
Agreed, that ulimit isn't reflecting any such limit, but is that really
the same environment the postmaster gets started in? I wouldn't trust
a system startup script to be launched in the same environment that a
login shell gets. You might try adding
ulimit -a >/tmp/something
to the startup script to find out for sure.
Our startup script is calling
su -l postgres ...
So I thought the limits would be the same (from the -l flag)? I then tried
to mimic this with the following:
[root@170226-db7 ~]# su -l postgres -c "ulimit -a"
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
max rt priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 139264
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
which I think should accurately reflect what the postmaster environment
should be seeing.
Regards,
Matt
* Matt Magoffin (postgresql.org@msqr.us) wrote:
There is no /prod/<pid>/limits file, but here are
erp, that stinks. Must be on an older kernel? I've got it under (a
Debian-built) 2.6.26. I can't recall if there's another way to get
limit info for an active process.. Could use Tom's suggestion of
echo'ing ulimit -a out to a file somewhere during database start-up.
Thanks,
Stephen
Yeah, I remember we found a few xml-related leaks based on your reports.
However, there's not anything here to suggest that this query is
straining the capabilities of a 64-bit system with lots o RAM. It seems
certain you're hitting some artificial process-size limit, and the only
one I know about is ulimit.I wasn't aware of /proc/<pid>/limits before, but now that I've heard
of it, checking that for the postmaster and/or a backend seems like
a great idea.
This doesn't seem to exist for any process on this box:
[root@170226-db7 ~]# ls /proc/*/limit*
ls: /proc/*/limit*: No such file or directory
If this were a system-defined process-size limit, then should the query
still run out of memory after restarting Postgres? Most likely we'll have
to restart Postgres soon, and I'll retry this query after doing so. Based
on past experience, I'd expect the query to complete at that time.
From what we experience, Postgres seems to be slowly accumulating memory
in the fashion of a small memory leak and things start to fail with
out-of-memory errors after the server has been running for some time (e.g.
roughly 4-6 weeks). Restarting Postgres clears out the problems (after a
restart we can immediately run queries that were failing before the
restart)... but then the cycle starts again.
I just bring this up wondering if there is something possibly accumulating
within Postgres that isn't getting freed and might cause an out-of-memory
error like this in some way.
Regards,
Matt