is there a way to firmly cap postgres worker memory consumption?

Started by Steve Kehletabout 12 years ago9 messagesgeneral
Jump to latest
#1Steve Kehlet
steve.kehlet@gmail.com

I'm running into some scenarios where errant postgres queries are consuming
all memory and bringing down production boxes. I'm running Postgres 9.3.4
on CentOS 5.x VMs under XenServer. My custom config is here:
https://gist.github.com/skehlet/9984666

The incidents may be related to reports of an array_agg memory leak, and
another recent incident seems like it was due to the xpath() function
provided by --with-xml. There's some more information on the former in my
post at:
http://dba.stackexchange.com/questions/62587/why-does-this-query-cause-the-postgres-worker-process-to-consume-so-much-memory,
if it's interesting. I'd like to dig into that further, maybe on another
thread.

But for now, the devs are asking me for some way to put a cap on a postgres
query's total memory consumption. I'm familiar with the available settings
on the "Resource Consumption" docs (and you can see my settings in my gist
above, including work_mem turned way down to 1MB for testing), but it seems
like there are things like Materialize that remain uncappable, since
they're not constrained by the shared_buffers and work_mem limits. For
example in my post to dba.stackexchange above, I found by doing a "set
enable_material=false;" I could stop a particularly nasty query from
consuming all memory, narrowing the problem somewhat, but that query is one
we have to live with in the near term.

With all that laid out, can anyone help with the following questions:

- Is there any way to set to total memory cap on a worker processes memory
consumption?
- Are there other (even undocumented) settings to cap memory usage?
- Other ideas?

In the meantime, to stop the bleeding I'm looking into using the postgres
feature to coerce the OOM killer to kill rogue child processes first (-
DLINUX_OOM_ADJ). Unfortunately, another setback, even the latest kernel on
CentOS 5.x doesn't allow non-root processes to write to /proc/self/oom_adj,
so it isn't working. So I'll either need to patch the CentOS kernel and
rebuild, move to a newer kernel, or maybe do something hacky like run a
background process to reassign the oom_adj value for all postgres workers
found.

Thanks for any help/ideas!

#2Amador Alvarez
apradopg@gmail.com
In reply to: Steve Kehlet (#1)
Re: is there a way to firmly cap postgres worker memory consumption?

Hi Steve,

Did you check if your OS allows you to limit memory usage per user /
process basis?

Did you either profiled or debugged it to see what is going on?

When memory consumption grows , Doesn't ever swap out?

Thanks,
A.A.

On Tue, Apr 8, 2014 at 10:48 AM, Steve Kehlet <steve.kehlet@gmail.com>wrote:

Show quoted text

I'm running into some scenarios where errant postgres queries are
consuming all memory and bringing down production boxes. I'm running
Postgres 9.3.4 on CentOS 5.x VMs under XenServer. My custom config is here:
https://gist.github.com/skehlet/9984666

The incidents may be related to reports of an array_agg memory leak, and
another recent incident seems like it was due to the xpath() function
provided by --with-xml. There's some more information on the former in my
post at:
http://dba.stackexchange.com/questions/62587/why-does-this-query-cause-the-postgres-worker-process-to-consume-so-much-memory,
if it's interesting. I'd like to dig into that further, maybe on another
thread.

But for now, the devs are asking me for some way to put a cap on a
postgres query's total memory consumption. I'm familiar with the available
settings on the "Resource Consumption" docs (and you can see my settings in
my gist above, including work_mem turned way down to 1MB for testing), but
it seems like there are things like Materialize that remain uncappable,
since they're not constrained by the shared_buffers and work_mem limits.
For example in my post to dba.stackexchange above, I found by doing a "set
enable_material=false;" I could stop a particularly nasty query from
consuming all memory, narrowing the problem somewhat, but that query is one
we have to live with in the near term.

With all that laid out, can anyone help with the following questions:

- Is there any way to set to total memory cap on a worker processes memory
consumption?
- Are there other (even undocumented) settings to cap memory usage?
- Other ideas?

In the meantime, to stop the bleeding I'm looking into using the postgres
feature to coerce the OOM killer to kill rogue child processes first (-
DLINUX_OOM_ADJ). Unfortunately, another setback, even the latest kernel
on CentOS 5.x doesn't allow non-root processes to write
to /proc/self/oom_adj, so it isn't working. So I'll either need to patch
the CentOS kernel and rebuild, move to a newer kernel, or maybe do
something hacky like run a background process to reassign the oom_adj value
for all postgres workers found.

Thanks for any help/ideas!

#3Steve Kehlet
steve.kehlet@gmail.com
In reply to: Amador Alvarez (#2)
Re: is there a way to firmly cap postgres worker memory consumption?

Did you check if your OS allows you to limit memory usage per user /
process basis?

I'll definitely dig more into this. I found some stuff saying it isn't as
easy as one would expect, i.e. not just a ulimit and done. Or maybe it is,
in this case. I will play around.

Did you either profiled or debugged it to see what is going on?

I would love to learn more about how to do this, to get to the bottom of
the memory usage. I can google around, or can you suggest any reads?

When memory consumption grows , Doesn't ever swap out?

Yes, it consumes all swap.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Kehlet (#1)
Re: is there a way to firmly cap postgres worker memory consumption?

Steve Kehlet <steve.kehlet@gmail.com> writes:

But for now, the devs are asking me for some way to put a cap on a postgres
query's total memory consumption. I'm familiar with the available settings
on the "Resource Consumption" docs (and you can see my settings in my gist
above, including work_mem turned way down to 1MB for testing), but it seems
like there are things like Materialize that remain uncappable, since
they're not constrained by the shared_buffers and work_mem limits.

Materialize nodes should honor the work_mem limit. That's per node of
course, but you only have half a dozen of them in this query, so I doubt
that's where the problem is. You mentioned array_agg ... how are you
using that exactly, and how large a result are you expecting it to create?
If you had an array_agg being used in a HashAggregate plan node, I'd
figure for sure that was the source of your problem, but the explain
output shows it's a GroupAggregate so there should only be one array_agg
active at a time.

Anyway, what I would try doing is starting the postmaster with a "ulimit
-d" value corresponding to the max per-process data segment size you want.
Something in the range of 50-100MB would probably be reasonable if your
queries aren't too complex.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Kehlet (#3)
Re: is there a way to firmly cap postgres worker memory consumption?

Steve Kehlet <steve.kehlet@gmail.com> writes:

Did you either profiled or debugged it to see what is going on?

I would love to learn more about how to do this, to get to the bottom of
the memory usage. I can google around, or can you suggest any reads?

Once you've got a ulimit in place so that malloc eventually fails with
ENOMEM, the backend process should print a memory context dump on stderr
when it hits that. Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Steve Kehlet
steve.kehlet@gmail.com
In reply to: Tom Lane (#5)
Re: is there a way to firmly cap postgres worker memory consumption?

On Tue, Apr 8, 2014 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Once you've got a ulimit in place so that malloc eventually fails with
ENOMEM, the backend process should print a memory context dump on stderr
when it hits that. Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

Thank you. For some reason I couldn't get it to trip with "ulimit -d
51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope
this is readable, if not it's also here:
https://gist.github.com/skehlet/10206741

TopMemoryContext: 123760 total in 16 blocks; 2720 free (7 chunks); 121040
used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks);
20496 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
TopTransactionContext: 8192 total in 1 blocks; 7368 free (0 chunks); 824
used
MessageContext: 4186112 total in 9 blocks; 633960 free (19 chunks);
3552152 used
JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768
used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
ExecutorState: 566208 total in 12 blocks; 195256 free (2 chunks);
370952 used
TupleSort: 17817440 total in 36 blocks; 9326832 free (8801 chunks);
8490608 used
TupleSort: 17500568 total in 86 blocks; 12016040 free (2896
chunks); 5484528 used
TupleSort: 17616584 total in 86 blocks; 9975520 free (3649 chunks);
7641064 used
TupleSort: 10825056 total in 14 blocks; 3896400 free (45082
chunks); 6928656 used
TupleSort: 17459720 total in 67 blocks; 8495600 free (2226 chunks);
8964120 used
TupleSort: 10899360 total in 23 blocks; 2112448 free (23706
chunks); 8786912 used
TupleSort: 11908776 total in 15 blocks; 8344064 free (145642
chunks); 3564712 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 812638208 total in 108 blocks; 183520 free (171
chunks); 812454688 used
AggContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704
used
CacheMemoryContext: 1342128 total in 21 blocks; 424872 free (1 chunks);
917256 used
pg_toast_16646_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008
used
d_mc2_fki: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_mc2_pkey: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mcm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_mm2_fki: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
d_mm2_index2: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mm2_index1: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_m2_index1: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_m2_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64
free (0 chunks); 960 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks);
1008 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824
used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks);
1288 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks);
1008 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824
used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824
used
pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks);
1336 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks);
1336 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_authid_rolname_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_database_datname_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks);
1336 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
MdSmgr: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 3072 total in 2 blocks; 80 free (1 chunks); 2992 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
2014-04-08 23:00:14 GMT [31229]: [1-1] ERROR: out of memory
2014-04-08 23:00:14 GMT [31229]: [2-1] DETAIL: Failed on request of size
597.
2014-04-08 23:00:14 GMT [31229]: [3-1] STATEMENT: SELECT MESSAGE.ID,
MESSAGE.SERVER_ID,
MESSAGE.RECEIVED_DATE,
MESSAGE.PROCESSED,
MESSAGE.ORIGINAL_ID,
MESSAGE.IMPORT_ID,
MESSAGE.IMPORT_CHANNEL_ID,
METADATA_IDS
FROM
( SELECT M.ID,
ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',') AS
METADATA_IDS
FROM D_M2 M
JOIN D_MM2 MM ON MM.MESSAGE_ID = M.ID
LEFT JOIN D_MCM2 MCM ON MCM.MESSAGE_ID = MM.MESSAGE_ID
AND MCM.METADATA_ID = MM.ID
LEFT JOIN D_MC2 MC_1 ON MC_1.MESSAGE_ID = M.ID
AND MC_1.IS_ENCRYPTED = FALSE
AND MC_1.METADATA_ID = 0
AND ((MM.ID = 0
AND MC_1.CONTENT_TYPE = 1)
OR (MM.ID > 0
AND MC_1.CONTENT_TYPE = 4))
LEFT JOIN D_MC2 MC_2 ON MC_2.MESSAGE_ID = M.ID
AND MC_2.IS_ENCRYPTED = FALSE
AND MC_2.METADATA_ID = MM.ID
AND MC_2.CONTENT_TYPE = '2'
LEFT JOIN D_MC2 MC_3 ON MC_3.MESSAGE_ID = M.ID
AND MC_3.IS_ENCRYPTED = FALSE
AND MC_3.METADATA_ID = MM.ID
AND MC_3.CONTENT_TYPE = '3'
LEFT JOIN D_MC2 MC_4 ON MC_4.MESSAGE_ID = M.ID
AND MC_4.IS_ENCRYPTED = FALSE
AND MC_4.METADATA_ID = MM.ID
AND MC_4.CONTENT_TYPE = '4'
LEFT JOIN D_MC2 MC_5 ON MC_5.MESSAGE_ID = M.ID
AND MC_5.IS_ENCRYPTED = FALSE
AND MC_5.METADATA_ID = MM.ID
AND MC_5.CONTENT_TYPE = '5'
LEFT JOIN D_MC2 MC_6 ON MC_6.MESSAGE_ID = M.ID
AND MC_6.IS_ENCRYPTED = FALSE
AND MC_6.METADATA_ID = MM.ID
AND MC_6.CONTENT_TYPE = '6'
LEFT JOIN D_MC2 MC_7 ON MC_7.MESSAGE_ID = M.ID
AND MC_7.IS_ENCRYPTED = FALSE
AND MC_7.METADATA_ID = MM.ID
AND MC_7.CONTENT_TYPE = '7'
LEFT JOIN D_MC2 MC_8 ON MC_8.MESSAGE_ID = M.ID
AND MC_8.IS_ENCRYPTED = FALSE
AND MC_8.METADATA_ID = MM.ID
AND MC_8.CONTENT_TYPE ='8'
WHERE M.ID <= '230716'
AND ( M.ID::VARCHAR ILIKE '%' || 'rwar mea' || '%'
OR MM.CONNECTOR_NAME ILIKE '%' || 'rwar mea' || '%'
OR MC_1.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_2.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_3.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_4.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_5.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_6.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_7.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_8.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MCM."SOURCE" ILIKE '%' || 'rwar mea' || '%'
OR MCM."TYPE" ILIKE '%' || 'rwar mea' || '%' )
GROUP BY M.ID
ORDER BY M.ID DESC
OFFSET 0 LIMIT 21 ) AS SUBQUERY
INNER JOIN D_M2 MESSAGE ON SUBQUERY.ID = MESSAGE.ID
ORDER BY MESSAGE.ID DESC
;
2014-04-08 23:00:14 GMT [31229]: [4-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.7", size 124600320
2014-04-08 23:00:14 GMT [31229]: [5-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.6", size 36167680
2014-04-08 23:00:14 GMT [31229]: [6-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.5", size 1073741824
2014-04-08 23:00:14 GMT [31229]: [7-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.4", size 823541760
2014-04-08 23:00:14 GMT [31229]: [8-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.3", size 5324800
2014-04-08 23:00:14 GMT [31229]: [9-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.2", size 402513920
2014-04-08 23:00:14 GMT [31229]: [10-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.1", size 38617088
2014-04-08 23:00:14 GMT [31229]: [11-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.0", size 5079040

#7Steve Kehlet
steve.kehlet@gmail.com
In reply to: Steve Kehlet (#6)
Re: is there a way to firmly cap postgres worker memory consumption?

I should add I had to move testing to a different VM, with less RAM and
different tunings; they are here: https://gist.github.com/skehlet/10207086

On Tue, Apr 8, 2014 at 4:08 PM, Steve Kehlet <steve.kehlet@gmail.com> wrote:

Show quoted text

On Tue, Apr 8, 2014 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Once you've got a ulimit in place so that malloc eventually fails with

ENOMEM, the backend process should print a memory context dump on stderr
when it hits that. Make sure your logging setup captures the process
stderr someplace (logging_collector works for this, syslog does not).
Post the dump here when you've got it.

Thank you. For some reason I couldn't get it to trip with "ulimit -d
51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope
this is readable, if not it's also here:
https://gist.github.com/skehlet/10206741

TopMemoryContext: 123760 total in 16 blocks; 2720 free (7 chunks); 121040
used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks);
20496 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks);
12688 used
TopTransactionContext: 8192 total in 1 blocks; 7368 free (0 chunks); 824
used
MessageContext: 4186112 total in 9 blocks; 633960 free (19 chunks);
3552152 used
JoinRelHashTable: 8192 total in 1 blocks; 624 free (0 chunks); 7568
used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks);
14768 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 816 free (0 chunks); 208 used
ExecutorState: 566208 total in 12 blocks; 195256 free (2 chunks);
370952 used
TupleSort: 17817440 total in 36 blocks; 9326832 free (8801
chunks); 8490608 used
TupleSort: 17500568 total in 86 blocks; 12016040 free (2896
chunks); 5484528 used
TupleSort: 17616584 total in 86 blocks; 9975520 free (3649
chunks); 7641064 used
TupleSort: 10825056 total in 14 blocks; 3896400 free (45082
chunks); 6928656 used
TupleSort: 17459720 total in 67 blocks; 8495600 free (2226
chunks); 8964120 used
TupleSort: 10899360 total in 23 blocks; 2112448 free (23706
chunks); 8786912 used
TupleSort: 11908776 total in 15 blocks; 8344064 free (145642
chunks); 3564712 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 812638208 total in 108 blocks; 183520 free (171
chunks); 812454688 used
AggContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704
used
CacheMemoryContext: 1342128 total in 21 blocks; 424872 free (1 chunks);
917256 used
pg_toast_16646_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008
used
d_mc2_fki: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_mc2_pkey: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mcm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_mm2_fki: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
d_mm2_index2: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mm2_index1: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
d_mm2_pkey: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_m2_index1: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
d_m2_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64
free (0 chunks); 960 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks);
1008 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824
used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks);
1288 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free
(0 chunks); 824 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks);
960 used
pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824
used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free
(2 chunks); 1288 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2
chunks); 1288 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks);
1336 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks);
1336 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_authid_rolname_index: 1024 total in 1 blocks; 200 free (0 chunks);
824 used
pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks);
872 used
pg_database_datname_index: 1024 total in 1 blocks; 200 free (0
chunks); 824 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0
chunks); 960 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2
chunks); 1336 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0
chunks); 1008 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872
used
MdSmgr: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 3072 total in 2 blocks; 80 free (1 chunks); 2992 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656
used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
2014-04-08 23:00:14 GMT [31229]: [1-1] ERROR: out of memory
2014-04-08 23:00:14 GMT [31229]: [2-1] DETAIL: Failed on request of size
597.
2014-04-08 23:00:14 GMT [31229]: [3-1] STATEMENT: SELECT MESSAGE.ID,
MESSAGE.SERVER_ID,
MESSAGE.RECEIVED_DATE,
MESSAGE.PROCESSED,
MESSAGE.ORIGINAL_ID,
MESSAGE.IMPORT_ID,
MESSAGE.IMPORT_CHANNEL_ID,
METADATA_IDS
FROM
( SELECT M.ID,
ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',') AS
METADATA_IDS
FROM D_M2 M
JOIN D_MM2 MM ON MM.MESSAGE_ID = M.ID
LEFT JOIN D_MCM2 MCM ON MCM.MESSAGE_ID = MM.MESSAGE_ID
AND MCM.METADATA_ID = MM.ID
LEFT JOIN D_MC2 MC_1 ON MC_1.MESSAGE_ID = M.ID
AND MC_1.IS_ENCRYPTED = FALSE
AND MC_1.METADATA_ID = 0
AND ((MM.ID = 0
AND MC_1.CONTENT_TYPE = 1)
OR (MM.ID > 0
AND MC_1.CONTENT_TYPE = 4))
LEFT JOIN D_MC2 MC_2 ON MC_2.MESSAGE_ID = M.ID
AND MC_2.IS_ENCRYPTED = FALSE
AND MC_2.METADATA_ID = MM.ID
AND MC_2.CONTENT_TYPE = '2'
LEFT JOIN D_MC2 MC_3 ON MC_3.MESSAGE_ID = M.ID
AND MC_3.IS_ENCRYPTED = FALSE
AND MC_3.METADATA_ID = MM.ID
AND MC_3.CONTENT_TYPE = '3'
LEFT JOIN D_MC2 MC_4 ON MC_4.MESSAGE_ID = M.ID
AND MC_4.IS_ENCRYPTED = FALSE
AND MC_4.METADATA_ID = MM.ID
AND MC_4.CONTENT_TYPE = '4'
LEFT JOIN D_MC2 MC_5 ON MC_5.MESSAGE_ID = M.ID
AND MC_5.IS_ENCRYPTED = FALSE
AND MC_5.METADATA_ID = MM.ID
AND MC_5.CONTENT_TYPE = '5'
LEFT JOIN D_MC2 MC_6 ON MC_6.MESSAGE_ID = M.ID
AND MC_6.IS_ENCRYPTED = FALSE
AND MC_6.METADATA_ID = MM.ID
AND MC_6.CONTENT_TYPE = '6'
LEFT JOIN D_MC2 MC_7 ON MC_7.MESSAGE_ID = M.ID
AND MC_7.IS_ENCRYPTED = FALSE
AND MC_7.METADATA_ID = MM.ID
AND MC_7.CONTENT_TYPE = '7'
LEFT JOIN D_MC2 MC_8 ON MC_8.MESSAGE_ID = M.ID
AND MC_8.IS_ENCRYPTED = FALSE
AND MC_8.METADATA_ID = MM.ID
AND MC_8.CONTENT_TYPE ='8'
WHERE M.ID <= '230716'
AND ( M.ID::VARCHAR ILIKE '%' || 'rwar mea' || '%'
OR MM.CONNECTOR_NAME ILIKE '%' || 'rwar mea' || '%'
OR MC_1.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_2.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_3.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_4.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_5.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_6.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_7.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MC_8.CONTENT ILIKE '%' || 'rwar mea' || '%'
OR MCM."SOURCE" ILIKE '%' || 'rwar mea' || '%'
OR MCM."TYPE" ILIKE '%' || 'rwar mea' || '%' )
GROUP BY M.ID
ORDER BY M.ID DESC
OFFSET 0 LIMIT 21 ) AS SUBQUERY
INNER JOIN D_M2 MESSAGE ON SUBQUERY.ID = MESSAGE.ID
ORDER BY MESSAGE.ID DESC
;
2014-04-08 23:00:14 GMT [31229]: [4-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.7", size 124600320
2014-04-08 23:00:14 GMT [31229]: [5-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.6", size 36167680
2014-04-08 23:00:14 GMT [31229]: [6-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.5", size 1073741824
2014-04-08 23:00:14 GMT [31229]: [7-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.4", size 823541760
2014-04-08 23:00:14 GMT [31229]: [8-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.3", size 5324800
2014-04-08 23:00:14 GMT [31229]: [9-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.2", size 402513920
2014-04-08 23:00:14 GMT [31229]: [10-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.1", size 38617088
2014-04-08 23:00:14 GMT [31229]: [11-1] LOG: temporary file: path
"base/pgsql_tmp/pgsql_tmp31229.0", size 5079040

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Kehlet (#6)
Re: is there a way to firmly cap postgres worker memory consumption?

Steve Kehlet <steve.kehlet@gmail.com> writes:

Thank you. For some reason I couldn't get it to trip with "ulimit -d
51200", but "ulimit -v 1572864" (1.5GiB) got me this in serverlog. I hope
this is readable, if not it's also here:

Well, here's the problem:

ExprContext: 812638208 total in 108 blocks; 183520 free (171
chunks); 812454688 used

So something involved in expression evaluation is eating memory.
Looking at the query itself, I'd have to bet on this:

ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',')

My guess is that this aggregation is being done across a lot more rows
than you were expecting, and the resultant array/string therefore eats
lots of memory. You might try replacing that with COUNT(*), or even
better SUM(LENGTH(MM.ID::CHARACTER VARYING)), just to get some definitive
evidence about what the query is asking to compute.

Meanwhile, it seems like ulimit -v would provide the safety valve
you asked for originally. I too am confused about why -d didn't
do it, but as long as you found a variant that works ...

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Steve Kehlet
steve.kehlet@gmail.com
In reply to: Tom Lane (#8)
Re: is there a way to firmly cap postgres worker memory consumption?

On Tue, Apr 8, 2014 at 6:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, here's the problem:

ExprContext: 812638208 total in 108 blocks; 183520 free (171
chunks); 812454688 used

So something involved in expression evaluation is eating memory.
Looking at the query itself, I'd have to bet on this:

ARRAY_TO_STRING(ARRAY_AGG(MM.ID::CHARACTER VARYING), ',')

My guess is that this aggregation is being done across a lot more rows
than you were expecting, and the resultant array/string therefore eats
lots of memory. You might try replacing that with COUNT(*), or even
better SUM(LENGTH(MM.ID::CHARACTER VARYING)), just to get some definitive
evidence about what the query is asking to compute.

The devs have moved on and want to stick with their new query, so I'll just
chalk this up to a bad query and let it go. But I'm glad to have learned a
few new tricks, thanks.

Meanwhile, it seems like ulimit -v would provide the safety valve
you asked for originally.

Thank you Amador and Tom for the ulimit solution, that's exactly what I
needed.