PostgreSQL 8.0.6 crash

Started by Mark Woodwardalmost 20 years ago44 messages
#1Mark Woodward
pgsql@mohawksoft.com
1 attachment(s)

PostgreSQL promptly uses all available memory for the query and
subsequently crashes.

I'm sure it can be corrected with a setting, but should it crash?

freedb=# create table ucode as select distinct ucode from cdtitles group
by ucode having count(ucode)>1 ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

(Table layout)
freedb=# \d cdtitles
Table "public.cdtitles"
Column | Type | Modifiers
-----------+-------------------+-----------------------------------
cdid | integer | default nextval('seq_cdid'::text)
cdcode | character varying |
ucode | character varying |
title | character varying |
artist | character varying |
genre | character varying |
dgenre | character varying |
year | integer |
processed | character varying |
submit | character varying |
revision | integer |
disclen | integer |
tracks | integer |
extkey | integer |
offsets | integer[] |
Indexes:
"cdtitles_ucode" btree (ucode) CLUSTER

(Size of table)
freedb=# select count(*) from cdtitles ;
count
---------
1927912
(1 row)

(Sample row)
freedb=# select * from cdtitles where cdid = 100001 limit 1;
cdid | cdcode | ucode | title |
artist | genre | dgenre | year | processed
| submit | revision | disclen | tracks | extkey |
offsets
--------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------
100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V�arna
(CD1) | Streaplers | rock | | 0 | cddbd v1.5PL3 Copyright (c)
Steve Scherf et al. | dBpowerAMP V4.0 | 0 | | 19 |
|
{21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0}
(1 row

Attachments:

postgresql.confapplication/octet-stream; name=postgresql.confDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#1)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

PostgreSQL promptly uses all available memory for the query and
subsequently crashes.

I'll bet a nickel this is on a Linux machine with OOM kill enabled.
What does the postmaster log show --- or look in the kernel log to
see if it mentions anything about an out-of-memory kill.

freedb=# create table ucode as select distinct ucode from cdtitles group
by ucode having count(ucode)>1 ;
server closed the connection unexpectedly

What does EXPLAIN show as the plan for that? If it's a hash aggregate,
try with "enable_hashagg" turned off. How many distinct ucode values
are there in the table?

regards, tom lane

#3Mark Woodward
pgsql@mohawksoft.com
In reply to: Mark Woodward (#1)
Re: PostgreSQL 8.0.6 crash

More info: the machine has 512M RAM and 512M swap
Work mem is set to:work_mem = 1024

This should't have crashed, should it?

Show quoted text

PostgreSQL promptly uses all available memory for the query and
subsequently crashes.

I'm sure it can be corrected with a setting, but should it crash?

freedb=# create table ucode as select distinct ucode from cdtitles group
by ucode having count(ucode)>1 ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

(Table layout)
freedb=# \d cdtitles
Table "public.cdtitles"
Column | Type | Modifiers
-----------+-------------------+-----------------------------------
cdid | integer | default nextval('seq_cdid'::text)
cdcode | character varying |
ucode | character varying |
title | character varying |
artist | character varying |
genre | character varying |
dgenre | character varying |
year | integer |
processed | character varying |
submit | character varying |
revision | integer |
disclen | integer |
tracks | integer |
extkey | integer |
offsets | integer[] |
Indexes:
"cdtitles_ucode" btree (ucode) CLUSTER

(Size of table)
freedb=# select count(*) from cdtitles ;
count
---------
1927912
(1 row)

(Sample row)
freedb=# select * from cdtitles where cdid = 100001 limit 1;
cdid | cdcode | ucode | title |
artist | genre | dgenre | year | processed
| submit | revision | disclen | tracks | extkey |
offsets
--------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------
100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V&#65533;arna
(CD1) | Streaplers | rock | | 0 | cddbd v1.5PL3 Copyright (c)
Steve Scherf et al. | dBpowerAMP V4.0 | 0 | | 19 |
|
{21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0}
(1 row
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#1)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

-> HashAggregate (cost=106527.68..106528.68 rows=200 width=32)
Filter: (count(ucode) > 1)
-> Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912
width=32)

Well, shouldn't hash aggregate respect work memory limits?

If the planner thought there were 1.7M distinct values, it wouldn't have
used hash aggregate ... but it only thinks there are 200, which IIRC is
the default assumption. Have you ANALYZEd this table lately?

Meanwhile, I'd strongly recommend turning off OOM kill. That's got to
be the single worst design decision in the entire Linux kernel.

regards, tom lane

#5Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#2)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

PostgreSQL promptly uses all available memory for the query and
subsequently crashes.

I'll bet a nickel this is on a Linux machine with OOM kill enabled.
What does the postmaster log show --- or look in the kernel log to
see if it mentions anything about an out-of-memory kill.

That's a no brainer. Maybe I should have rephrased the condition, crash
may be the wrong word, it was definitely killed by out of memory. Sorry.

freedb=# create table ucode as select distinct ucode from cdtitles group
by ucode having count(ucode)>1 ;
server closed the connection unexpectedly

What does EXPLAIN show as the plan for that? If it's a hash aggregate,
try with "enable_hashagg" turned off. How many distinct ucode values
are there in the table?

There are over 1.7M distinct rows, about 200K non-distinct that I want to
somehow remove.

It does have hash aggregate:
freedb=# explain select distinct ucode from cdtitles group by ucode having
count(ucode)>1 ;
QUERY PLAN
-------------------------------------------------------------------------------------
Unique (cost=106536.32..106537.32 rows=200 width=32)
-> Sort (cost=106536.32..106536.82 rows=200 width=32)
Sort Key: ucode
-> HashAggregate (cost=106527.68..106528.68 rows=200 width=32)
Filter: (count(ucode) > 1)
-> Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912
width=32)
(6 rows)

Well, shouldn't hash aggregate respect work memory limits?

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#1)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

Think of it as a kernel bug.

Meanwhile, I'd strongly recommend turning off OOM kill. That's got to
be the single worst design decision in the entire Linux kernel.

How is this any different than the FreeBSD having a default 512M process
size limit? On FreeBSD, the process would have been killed earlier.

No, the process would have been politely told it was out of memory, and
would have told you the same. If the kernel's way of notifying a
process that it's out of memory is SIGKILL, there is not a damn thing
that we can do to operate robustly.

regards, tom lane

#7Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#4)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

-> HashAggregate (cost=106527.68..106528.68 rows=200
width=32)
Filter: (count(ucode) > 1)
-> Seq Scan on cdtitles (cost=0.00..96888.12
rows=1927912
width=32)

Well, shouldn't hash aggregate respect work memory limits?

If the planner thought there were 1.7M distinct values, it wouldn't have
used hash aggregate ... but it only thinks there are 200, which IIRC is
the default assumption. Have you ANALYZEd this table lately?

I thought that I had, but I did CLUSTER at some point. Or maybe I didn't
I'm, not sure. I have been working on a file reader/parser/importer
program. I created and dropped the DB so many times it is hard to keep
track. Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

Meanwhile, I'd strongly recommend turning off OOM kill. That's got to
be the single worst design decision in the entire Linux kernel.

How is this any different than the FreeBSD having a default 512M process
size limit? On FreeBSD, the process would have been killed earlier.

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: PostgreSQL 8.0.6 crash

Tom Lane wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

Think of it as a kernel bug.

Meanwhile, I'd strongly recommend turning off OOM kill. That's got to
be the single worst design decision in the entire Linux kernel.

How is this any different than the FreeBSD having a default 512M process
size limit? On FreeBSD, the process would have been killed earlier.

No, the process would have been politely told it was out of memory, and
would have told you the same. If the kernel's way of notifying a
process that it's out of memory is SIGKILL, there is not a damn thing
that we can do to operate robustly.

And we have docco on it:
http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105
which I assume is still current. Back in October I mentioned the OOM
killer to Andrew Morton - his reaction was a very pained look and a curt
admonition: "turn it off".

cheers

andrew

#9Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#6)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

Think of it as a kernel bug.

While I respect your viewpoint that the Linux kernel should not kill an
offending process if the system runs out of memory, I sort of disagree in
that OOM is a disaster preventor. It should be viewed as a last ditch "him
or me" choice the kernel needs to make and it should not get into that
position in the first place.

Regardless, it is troubling that failing to have current stats can cause
the system, with a large data set, to exceed working memory limits.

I think it is still a bug. While it may manifest itself as a pg crash on
Linux because of a feature with which you have issue, the fact remains
that PG is exeeding its working memory limit.

Should failing to run ANALYZE cause this behavior?
If so, how does this get clearly documented?
If not, can it be prevented?

Meanwhile, I'd strongly recommend turning off OOM kill. That's got to
be the single worst design decision in the entire Linux kernel.

How is this any different than the FreeBSD having a default 512M process
size limit? On FreeBSD, the process would have been killed earlier.

No, the process would have been politely told it was out of memory, and
would have told you the same. If the kernel's way of notifying a
process that it's out of memory is SIGKILL, there is not a damn thing
that we can do to operate robustly.

Lets not waste time on a Linux discussion. Linux and FreeBSD have their
strengths, and a debate on the dubious merits of either is a long and
contentious debate. Both systems are fine, just with some subtle
differences in design goals.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#9)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

I think it is still a bug. While it may manifest itself as a pg crash on
Linux because of a feature with which you have issue, the fact remains
that PG is exeeding its working memory limit.

The problem is that *we have no way to know what that limit is* ---
short of exceeding it and being summarily killed. (BTW, the kernel
doesn't know what the limit is either.) There is simply not any way
to operate robustly under the OOM-kill regime.

While I'll certainly acknowledge that it'd be nice if hashagg had
spill-to-disk capability, that wouldn't alter the fundamental fact that
if you want reliable behavior you MUST turn off OOM kill. There is not
anything we can do at the database level to work around that kernel-level
misdesign.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Woodward (#1)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not. The OOM killer is entirely capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

regards, tom lane

#12Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#11)
Re: PostgreSQL 8.0.6 crash

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not. The OOM killer is entirely capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Unless I've missed something here, disabling the OOM killer doesn't
really solve the problem here. What solves the problem is running
ANALYZE but it's still certainly the case that it would make some sense
for the Postmaster, upon realizing that it's gone well beyond its
work_mem boundary, to ideally switch plans to one which isn't going to
exceed its work_mem limit. Less ideally, it could give up and issue an
error to the user instead of running the box out of memory.

I appriciate that this is probably not very easy to implement but I
do believe the current situation could be improved in this regard.

Thanks,

Stephen

#13Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#10)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

I think it is still a bug. While it may manifest itself as a pg crash on
Linux because of a feature with which you have issue, the fact remains
that PG is exeeding its working memory limit.

The problem is that *we have no way to know what that limit is* ---
short of exceeding it and being summarily killed. (BTW, the kernel
doesn't know what the limit is either.) There is simply not any way
to operate robustly under the OOM-kill regime.

No, you misunderstand what I said, the "working memory" as defined in
postgresql.conf. I don't care about the OS debate.

While I'll certainly acknowledge that it'd be nice if hashagg had
spill-to-disk capability, that wouldn't alter the fundamental fact that
if you want reliable behavior you MUST turn off OOM kill. There is not
anything we can do at the database level to work around that kernel-level
misdesign.

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

At issue is would a lack of ANALYZE justify this behavior? If so, it
should be documented.

#14Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#11)
Re: PostgreSQL 8.0.6 crash

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not. The OOM killer is entirely capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Sorry, I must strongly disagree here. The postgresql.conf "working mem" is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will work along
side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in fact, how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for 1024K, but
exceeds 1G) this is broken, and it may cause other systems to fail or
perform very poorly.

If it is not something that can be fixed, it should be clearly documented.

#15Jim C. Nasby
jnasby@pervasive.com
In reply to: Mark Woodward (#14)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not. The OOM killer is entirely capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Sorry, I must strongly disagree here. The postgresql.conf "working mem" is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will work along

Actually, no, it's not designed for that at all.

side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in fact, how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for 1024K, but
exceeds 1G) this is broken, and it may cause other systems to fail or
perform very poorly.

If it is not something that can be fixed, it should be clearly documented.

work_mem (integer)

Specifies the amount of memory to be used by internal sort
operations and hash tables before switching to temporary disk files.
The value is specified in kilobytes, and defaults to 1024 kilobytes
(1 MB). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be allowed to
use as much memory as this value specifies before it starts to put
data into temporary files. Also, several running sessions could be
doing such operations concurrently. So the total memory used could
be many times the value of work_mem; it is necessary to keep this
fact in mind when choosing the value. Sort operations are used for
ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
joins, hash-based aggregation, and hash-based processing of IN
subqueries.

So it says right there that it's very easy to exceed work_mem by a very
large amount. Granted, this is a very painful problem to deal with and
will hopefully be changed at some point, but it's pretty clear as to how
this works.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#16Jim C. Nasby
jnasby@pervasive.com
In reply to: Mark Woodward (#9)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 11:42:57AM -0500, Mark Woodward wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Still, I would say that is is extremly bad behavior for not having
stats, wouldn't you think?

Think of it as a kernel bug.

While I respect your viewpoint that the Linux kernel should not kill an
offending process if the system runs out of memory, I sort of disagree in
that OOM is a disaster preventor. It should be viewed as a last ditch "him
or me" choice the kernel needs to make and it should not get into that
position in the first place.

I've had processes run away on a FreeBSD box before, to the extent of
running entirely out of swap and memory. Instead of random processes
just dying for no apparent reason, I instead started getting a bunch of
out-of-memory errors. No disaster, I just fixed the problem and life
went on.

Well, ok, the box did become rather unresponsive when my fix for the
problem meant that all the sudden there were about 950 perl processes
trying to run at the same time. I wish I'd captured top showing 900+
runnable processes. But after a few minutes the processes started
completing and exiting and everything was soon back to normal. I rather
doubt Linux would survive that...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#17Stephen Frost
sfrost@snowman.net
In reply to: Jim C. Nasby (#15)
Re: PostgreSQL 8.0.6 crash

* Jim C. Nasby (jnasby@pervasive.com) wrote:

On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:

If it is not something that can be fixed, it should be clearly documented.

work_mem (integer)

Specifies the amount of memory to be used by internal sort
operations and hash tables before switching to temporary disk files.
The value is specified in kilobytes, and defaults to 1024 kilobytes
(1 MB). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be allowed to
use as much memory as this value specifies before it starts to put
data into temporary files. Also, several running sessions could be
doing such operations concurrently. So the total memory used could
be many times the value of work_mem; it is necessary to keep this
fact in mind when choosing the value. Sort operations are used for
ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
joins, hash-based aggregation, and hash-based processing of IN
subqueries.

So it says right there that it's very easy to exceed work_mem by a very
large amount. Granted, this is a very painful problem to deal with and
will hopefully be changed at some point, but it's pretty clear as to how
this works.

It also says that when it goes over, it'll spill to disk. Additionally,
we're talking about one hash here, not multiple ones. It seems at least
misleading as, if I understand correctly, Postgres isn't actually
actively checking to see if the amount of memory used by an in-progress
hash creation has gone over the limit but rather it guesses at how much
memory will be used during the planning stage to decide if a hash plan
is possible or not. That guess can certainly be wrong but there's
nothing in place to handle the situation where the guess is wrong...

Thanks,

Stephen

#18Greg Stark
gsstark@mit.edu
In reply to: Stephen Frost (#12)
Re: PostgreSQL 8.0.6 crash

Stephen Frost <sfrost@snowman.net> writes:

Unless I've missed something here, disabling the OOM killer doesn't
really solve the problem here.

Well in a way it does. Postgres would get an out-of-memory error from malloc
which it would handle properly and the world would be happy.

Except not quite, since I think an out of memory error still means that
backend exits instead of just that query failing. That means if you have an
application running such as apache then all subsequent transactions on that
connection fail too, instead of just the transaction that misbehaved.

And as the other poster mentioned, having Postgres use up every available byte
of memory isn't really very friendly to anything else running on the box.

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current transaction. That
would let an application continue operating normally after getting an error.

--
greg

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#18)
Re: PostgreSQL 8.0.6 crash

Greg Stark <gsstark@mit.edu> writes:

Except not quite, since I think an out of memory error still means that
backend exits instead of just that query failing.

Not at all! PG will recover from this perfectly well ... if it's given
the opportunity, rather than being SIGKILLed.

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current
transaction.

See ulimit (or local equivalent).

regards, tom lane

#20Mark Woodward
pgsql@mohawksoft.com
In reply to: Jim C. Nasby (#15)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether it's
strictly adhered to work_mem or not. The OOM killer is entirely

capable

of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Sorry, I must strongly disagree here. The postgresql.conf "working mem"
is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will work along

Actually, no, it's not designed for that at all.

I guess that's a matter of opinion.

side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in fact,
how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for 1024K,
but
exceeds 1G) this is broken, and it may cause other systems to fail or
perform very poorly.

If it is not something that can be fixed, it should be clearly
documented.

work_mem (integer)

Specifies the amount of memory to be used by internal sort
operations and hash tables before switching to temporary disk files.
The value is specified in kilobytes, and defaults to 1024 kilobytes
(1 MB). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be allowed to
use as much memory as this value specifies before it starts to put
data into temporary files. Also, several running sessions could be
doing such operations concurrently. So the total memory used could
be many times the value of work_mem; it is necessary to keep this
fact in mind when choosing the value. Sort operations are used for
ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
joins, hash-based aggregation, and hash-based processing of IN
subqueries.

So it says right there that it's very easy to exceed work_mem by a very
large amount. Granted, this is a very painful problem to deal with and
will hopefully be changed at some point, but it's pretty clear as to how
this works.

Well, if you read that paragraph carefully, I'll admit that I was a little
too literal in my statement apliying it to the "process" and not specific
functions within the process, but in the documentation:

"each one will be allowed to use as much memory as this value specifies
before it starts to put data into temporary files."

According to the documentation the behavior of hashagg is broken. It did
not use up to this amount and then start to use temporary files, it used
1000 times this limit and was killed by the OS.

I think it should be documented as the behavior is unpredictable.

#21Alvaro Herrera
alvherre@commandprompt.com
In reply to: Greg Stark (#18)
Re: PostgreSQL 8.0.6 crash

Greg Stark wrote:

Well in a way it does. Postgres would get an out-of-memory error from malloc
which it would handle properly and the world would be happy.

Except not quite, since I think an out of memory error still means that
backend exits instead of just that query failing.

Not at all -- the transaction is aborted, but the backend can continue
working perfectly fine.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#22Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#19)
Re: PostgreSQL 8.0.6 crash

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Greg Stark <gsstark@mit.edu> writes:

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current
transaction.

See ulimit (or local equivalent).

As much as setting ulimit in shell scripts is fun, I have to admit that
I really don't see it happening very much. Having Postgres set a ulimit
for itself may not be a bad idea and would perhaps provide a "least
suprise" for new users. Perhaps shared_buffers + 10*work_mem +
maintenance_work_mem + max_stack_depth? Then errors from running out of
memory could provide a 'HINT: Memory consumption went well over allowed
work_mem, perhaps you need to run ANALYZE or raise work_mem?'.

Just some thoughts,

Stephen

#23Greg Stark
gsstark@mit.edu
In reply to: Stephen Frost (#22)
Re: PostgreSQL 8.0.6 crash

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Greg Stark <gsstark@mit.edu> writes:

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current
transaction.

See ulimit (or local equivalent).

As much as setting ulimit in shell scripts is fun, I have to admit that
I really don't see it happening very much.

For one thing it requires admin access to the startup scripts to arrange this.
And it's always cluster-wide.

Having a GUC parameter would mean it could be set per-session. Even if the GUC
parameter were just implemented by calling setrlimit it might be useful.

--
greg

#24Mark Woodward
pgsql@mohawksoft.com
In reply to: Greg Stark (#23)
Re: PostgreSQL 8.0.6 crash

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Greg Stark <gsstark@mit.edu> writes:

It doesn't seem like a bad idea to have a max_memory parameter that

if a

backend ever exceeded it would immediately abort the current
transaction.

See ulimit (or local equivalent).

As much as setting ulimit in shell scripts is fun, I have to admit that
I really don't see it happening very much.

For one thing it requires admin access to the startup scripts to arrange
this.
And it's always cluster-wide.

Having a GUC parameter would mean it could be set per-session. Even if the
GUC
parameter were just implemented by calling setrlimit it might be useful.

I don't think it needs a new GUC parameter, just having hashagg respect
work_mem would fix the problem.

#25Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#19)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 02:35:34PM -0500, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

Except not quite, since I think an out of memory error still means that
backend exits instead of just that query failing.

Not at all! PG will recover from this perfectly well ... if it's given
the opportunity, rather than being SIGKILLed.

FWIW, the problem is mainly from the situation where some process
accesses a piece of memory that has been swapped out, but there is no
memory available to swap the page in. Or write to a page marked
copy-on-write. What do you do? There's is no way to return -ENOMEM from
a normal memory access and PostgreSQL wouldn't handle that anyway.

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour, it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#26Rick Gigger
rick@alpinenetworking.com
In reply to: Stephen Frost (#12)
Re: PostgreSQL 8.0.6 crash

On Feb 9, 2006, at 11:22 AM, Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working
memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process whether
it's
strictly adhered to work_mem or not. The OOM killer is entirely
capable
of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Unless I've missed something here, disabling the OOM killer doesn't
really solve the problem here. What solves the problem is running
ANALYZE but it's still certainly the case that it would make some
sense
for the Postmaster, upon realizing that it's gone well beyond its
work_mem boundary, to ideally switch plans to one which isn't going to
exceed its work_mem limit. Less ideally, it could give up and
issue an
error to the user instead of running the box out of memory.

So is the work_mem paramater that is set not strictly enforced? Is
it more like a suggestions as to what it SHOULD use and not a hard
limit on how much memory the each process is ALLOWED to use?

If his work_mem is set to 1 mb and that process is using 500 mb for
tasks that are supposed to stay in work_mem then doesn't that mean
that that limit is not really a hard limit but rather a suggestion?

Rick

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#25)
Re: PostgreSQL 8.0.6 crash

Martijn van Oosterhout <kleptog@svana.org> writes:

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour,

Yes it does, because the situation will never arise.

it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Given the current price of disk, there is no sane reason not to have
enough swap space configured to make this not-a-problem. The OOM kill
mechanism was a reasonable solution for running systems that were not
expected to be too reliable anyway on small hardware, but if you're
trying to run a 24/7 server you're simply incompetent if you don't
disable it.

regards, tom lane

#28Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#27)
Re: PostgreSQL 8.0.6 crash

Martijn van Oosterhout <kleptog@svana.org> writes:

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour,

Yes it does, because the situation will never arise.

it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Given the current price of disk, there is no sane reason not to have
enough swap space configured to make this not-a-problem. The OOM kill
mechanism was a reasonable solution for running systems that were not
expected to be too reliable anyway on small hardware, but if you're
trying to run a 24/7 server you're simply incompetent if you don't
disable it.

And people say I have STRONG opinions. Don't hold back Tom, let us know
what you really think.

#29Ernst Herzberg
earny@net4u.de
In reply to: Mark Woodward (#28)
Re: PostgreSQL 8.0.6 crash

On Friday 10 February 2006 00:53, Mark Woodward wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour,

Yes it does, because the situation will never arise.

it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Given the current price of disk, there is no sane reason not to have
enough swap space configured to make this not-a-problem. The OOM kill
mechanism was a reasonable solution for running systems that were not
expected to be too reliable anyway on small hardware, but if you're
trying to run a 24/7 server you're simply incompetent if you don't
disable it.

And people say I have STRONG opinions. Don't hold back Tom, let us know
what you really think.

Read
http://linux-mm.org/OverCommitAccounting
or
file://usr/src/linux/Documentation/vm/overcommit-accounting

It is a good idea to have enough swap space. If not, set
vm.overcommit_memory=2

#30Rick Gigger
rick@alpinenetworking.com
In reply to: Mark Woodward (#20)
Re: PostgreSQL 8.0.6 crash

On Feb 9, 2006, at 12:49 PM, Mark Woodward wrote:

On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote:

"Mark Woodward" <pgsql@mohawksoft.com> writes:

Again, regardless of OS used, hashagg will exceed "working
memory" as
defined in postgresql.conf.

So? If you've got OOM kill enabled, it can zap a process
whether it's
strictly adhered to work_mem or not. The OOM killer is entirely

capable

of choosing a victim process whose memory footprint hasn't changed
materially since it started (eg, the postmaster).

Sorry, I must strongly disagree here. The postgresql.conf
"working mem"
is
a VERY IMPORTANT setting, it is intended to limit the consumption of
memory by the postgresql process. Often times PostgreSQL will
work along

Actually, no, it's not designed for that at all.

I guess that's a matter of opinion.

side other application servers on the same system, infact, may be a
sub-part of application servers on the same system. (This is, in
fact,
how
it is used on one of my site servers.)

Clearly, if the server will use 1000 times this number (Set for
1024K,
but
exceeds 1G) this is broken, and it may cause other systems to
fail or
perform very poorly.

If it is not something that can be fixed, it should be clearly
documented.

work_mem (integer)

Specifies the amount of memory to be used by internal sort
operations and hash tables before switching to temporary disk
files.
The value is specified in kilobytes, and defaults to 1024
kilobytes
(1 MB). Note that for a complex query, several sort or hash
operations might be running in parallel; each one will be
allowed to
use as much memory as this value specifies before it starts to
put
data into temporary files. Also, several running sessions
could be
doing such operations concurrently. So the total memory used
could
be many times the value of work_mem; it is necessary to keep this
fact in mind when choosing the value. Sort operations are used
for
ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash
joins, hash-based aggregation, and hash-based processing of IN
subqueries.

So it says right there that it's very easy to exceed work_mem by a
very
large amount. Granted, this is a very painful problem to deal with
and
will hopefully be changed at some point, but it's pretty clear as
to how
this works.

Well, if you read that paragraph carefully, I'll admit that I was a
little
too literal in my statement apliying it to the "process" and not
specific
functions within the process, but in the documentation:

"each one will be allowed to use as much memory as this value
specifies
before it starts to put data into temporary files."

According to the documentation the behavior of hashagg is broken.
It did
not use up to this amount and then start to use temporary files, it
used
1000 times this limit and was killed by the OS.

I think it should be documented as the behavior is unpredictable.

It seems to me that the solution for THIS INCIDENT is to run an
analyze. That should fix the problem at hand. I have nothing to say
about the OOM issue except that hopefully the analyze will prevent
him from running out of memory at all.

However if hashagg truly does not obey the limit that is supposed to
be imposed by work_mem then it really ought to be documented. Is
there a misunderstanding here and it really does obey it? Or is
hashagg an exception but the other work_mem associated operations
work fine? Or is it possible for them all to go out of bounds?

Even if you've got 100 terabyts of swap space though if seems like if
your system is very heavy on reads then you would really want that
single backend to start using up your disk space and leave your
memory alone so that most of your data can stay cached and largely
unaffeted by the problem of one backend.

If your bottleneck is writing to the disk then it doesn't really seem
to matter. You just need to make sure that huge out of control
hashagg never occurs. If your disks get saturated with writes
because of the hashagg of one backend then all other processes that
need to write a lot of info to disk are going to come to a grinding
halt and queries are not going to complete quickly and build up and
you will have a huge mess on your hands that will essentially prevent
postgres from being able to do it's job even if it doesn't actually
die. In this situation disk bandwidth is a scarce commodity and
whether you let the OS handle it all with virtual memory or you let
postgres swap everything out to disc for that one operation you are
still using disc to make up for a lack of RAM. At some point you
you've either got to stock up on enough RAM to run your queries
properly or alter how your queries run to use less RAM. Having a
process go out of control in resource usage is going to cause big
problems one way or another.

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rick Gigger (#30)
Re: PostgreSQL 8.0.6 crash

Rick Gigger <rick@alpinenetworking.com> writes:

However if hashagg truly does not obey the limit that is supposed to
be imposed by work_mem then it really ought to be documented. Is
there a misunderstanding here and it really does obey it? Or is
hashagg an exception but the other work_mem associated operations
work fine? Or is it possible for them all to go out of bounds?

hashagg is the exception. It should be fixed, not documented, but no
one's got round to that.

One point to consider is that if the planner's estimate is as far off
as exhibited in the OP's example, a hashagg that does spill to disk
is likely to take so long that he'll be back here complaining that
the query never terminates ;-). In most practical situations, I think
exceeding work_mem is really the best solution, as long as it's not
by more than 10x or 100x. It's when the estimate is off by many
orders of magnitude that you've got a problem. Running out of memory
is not necessarily the worst response ... as long as the system doesn't
kill the process in response to that.

regards, tom lane

#32Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#31)
Re: PostgreSQL 8.0.6 crash

Rick Gigger <rick@alpinenetworking.com> writes:

However if hashagg truly does not obey the limit that is supposed to
be imposed by work_mem then it really ought to be documented. Is
there a misunderstanding here and it really does obey it? Or is
hashagg an exception but the other work_mem associated operations
work fine? Or is it possible for them all to go out of bounds?

hashagg is the exception. It should be fixed, not documented, but no
one's got round to that.

Well, it is clearly a pathalogical condition. Fixed? Sure, but someone
should document it so that others don't stumble across it.

One point to consider is that if the planner's estimate is as far off
as exhibited in the OP's example, a hashagg that does spill to disk
is likely to take so long that he'll be back here complaining that
the query never terminates ;-).

That's not fair, now is it? This isn't about the OP (me), it is about
PostgreSQL behaving badly.

In most practical situations, I think
exceeding work_mem is really the best solution, as long as it's not
by more than 10x or 100x. It's when the estimate is off by many
orders of magnitude that you've got a problem. Running out of memory
is not necessarily the worst response ... as long as the system doesn't
kill the process in response to that.

I don't agree with you here. Many PostgreSQL installations use PostgreSQL
as part of a larger whole. Adjusting "work_mem" should give the admin some
control over the memory footprint of the system. It is documented as the
limit a specific function path will use before spilling to disk.

I set up a lot of systems and I write a lot of software that uses
PostgreSQL. Periodically I run across features/problems/limitations of
PostgreSQL and post them.

This was/is an example of where the behavior of PostgreSQL is clearly
unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
isn't clear how anyone could have known this, and unexpected behavior is
bad in any product.

In your statement, "he'll be back here complaining that the query never
terminates," that's not true. A long query typically gets examined with
explain (or in Oracle, explain plan) and evaluated from there. When the
process exhibits runaway memory use, that's a problem.

#33Jim C. Nasby
jnasby@pervasive.com
In reply to: Greg Stark (#23)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 03:13:22PM -0500, Greg Stark wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Greg Stark <gsstark@mit.edu> writes:

It doesn't seem like a bad idea to have a max_memory parameter that if a
backend ever exceeded it would immediately abort the current
transaction.

See ulimit (or local equivalent).

As much as setting ulimit in shell scripts is fun, I have to admit that
I really don't see it happening very much.

For one thing it requires admin access to the startup scripts to arrange this.
And it's always cluster-wide.

Having a GUC parameter would mean it could be set per-session. Even if the GUC
parameter were just implemented by calling setrlimit it might be useful.

Trying to tune work_mem is extremely difficult in PostgreSQL, because
you are constantly running the risk of sending the server into a
swap-storm. Having a set-able per-backend memory limit would allow a lot
more flexability in setting work_mem, because you could now ensure that
you wouldn't push the server into serious swapping.

Even better would be a means to set a cluster-wide memory limit, but of
course that's substantially more work.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#34Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#27)
Re: PostgreSQL 8.0.6 crash

On Thu, Feb 09, 2006 at 05:04:38PM -0500, Tom Lane wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

When people talk about disabling the OOM killer, it doesn't stop the
SIGKILL behaviour,

Yes it does, because the situation will never arise.

it just causes the kernel to return -ENOMEM for
malloc() much much earlier... (ie when you still actually have memory
available).

Given the current price of disk, there is no sane reason not to have
enough swap space configured to make this not-a-problem. The OOM kill
mechanism was a reasonable solution for running systems that were not
expected to be too reliable anyway on small hardware, but if you're
trying to run a 24/7 server you're simply incompetent if you don't
disable it.

BTW, I was shocked when I found out that FreeBSD actually has an OOM
killer itself. Yet I've never heard of anyone having problems with it.
Granted, the FreeBSD OOM could be better designed to pick the right
process to kill, but I'd bet that the real reason you never hear about
it is because FreeBSD admins are clued enough to a) setup a reasonable
amount of swap and b) do a better job of monitoring memory usage so that
you don't start swapping in the first place.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#35Jim C. Nasby
jnasby@pervasive.com
In reply to: Mark Woodward (#32)
Re: PostgreSQL 8.0.6 crash

On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote:

In most practical situations, I think
exceeding work_mem is really the best solution, as long as it's not
by more than 10x or 100x. It's when the estimate is off by many
orders of magnitude that you've got a problem. Running out of memory
is not necessarily the worst response ... as long as the system doesn't
kill the process in response to that.

I don't agree with you here. Many PostgreSQL installations use PostgreSQL
as part of a larger whole. Adjusting "work_mem" should give the admin some
control over the memory footprint of the system. It is documented as the
limit a specific function path will use before spilling to disk.

And even when PostgreSQL has the server all to itself, having a hashagg
spill to disk is *way* better than pushing the machine into a swap
storm. At least if you spill the hashagg you only have one backend
running at a snail's pace; a swap storm means next to nothing gets done.

This was/is an example of where the behavior of PostgreSQL is clearly
unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
isn't clear how anyone could have known this, and unexpected behavior is
bad in any product.

Care to submit a documentation patch before releases are bundled (I
think on Sunday?) At least then people would be aware that work_mem is
just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
have time before the release. :(
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#36Martijn van Oosterhout
kleptog@svana.org
In reply to: Jim C. Nasby (#34)
Re: PostgreSQL 8.0.6 crash

On Fri, Feb 10, 2006 at 10:01:18AM -0600, Jim C. Nasby wrote:

BTW, I was shocked when I found out that FreeBSD actually has an OOM
killer itself. Yet I've never heard of anyone having problems with it.
Granted, the FreeBSD OOM could be better designed to pick the right
process to kill, but I'd bet that the real reason you never hear about
it is because FreeBSD admins are clued enough to a) setup a reasonable
amount of swap and b) do a better job of monitoring memory usage so that
you don't start swapping in the first place.

Hmm, I do wonder what FreeBSDs overcommit policy is. For example on my
computer right now the total allocated VM is approximately 3 times the
actual memory in the computer and about twice if you include swap. By a
strict policy of overcommit my computer wouldn't complete the boot
sequence, whereas as currently it runs without using any swap.

Disabling overcommit has a serious cost in that most of your VM will
never be used. Are people really suggesting that I can't run a few
daemons, X and a web-browser on FreeBSD without allocating 3 times my
physical memory in swap?

However, my real question is: while trying to find info about FreeBSDs
overcommit policy, I just get lot of people complaining about freebsd
killing random processes. Does anyone know a site that describes how it
works? I understand Linux's overcommit policy just fine.

Disclaimer: The Linux OOM killer has never killed the wrong process for
me, so I don't have any bad experiences with overcommit.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#37Mark Woodward
pgsql@mohawksoft.com
In reply to: Jim C. Nasby (#35)
Re: PostgreSQL 8.0.6 crash

On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote:

In most practical situations, I think
exceeding work_mem is really the best solution, as long as it's not
by more than 10x or 100x. It's when the estimate is off by many
orders of magnitude that you've got a problem. Running out of memory
is not necessarily the worst response ... as long as the system

doesn't

kill the process in response to that.

I don't agree with you here. Many PostgreSQL installations use
PostgreSQL
as part of a larger whole. Adjusting "work_mem" should give the admin
some
control over the memory footprint of the system. It is documented as the
limit a specific function path will use before spilling to disk.

And even when PostgreSQL has the server all to itself, having a hashagg
spill to disk is *way* better than pushing the machine into a swap
storm. At least if you spill the hashagg you only have one backend
running at a snail's pace; a swap storm means next to nothing gets done.

This was/is an example of where the behavior of PostgreSQL is clearly
unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
isn't clear how anyone could have known this, and unexpected behavior is
bad in any product.

Care to submit a documentation patch before releases are bundled (I
think on Sunday?) At least then people would be aware that work_mem is
just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
have time before the release. :(

I would be glad too. What's the process?

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#36)
Re: PostgreSQL 8.0.6 crash

Martijn van Oosterhout <kleptog@svana.org> writes:

Disclaimer: The Linux OOM killer has never killed the wrong process for
me, so I don't have any bad experiences with overcommit.

You haven't tried real hard. What I've seen recently when I do something
that makes a PG backend go overboard is that the kernel zaps both the
misbehaving backend and the bgwriter process. No idea what it's got
against the bgwriter, but the behavior's been pretty consistent under
recent Fedora 4 kernels ...

(This is on a development machine, not a server, so I'm not particularly
worried by leaving the default overcommit policy in place. I wouldn't
do that on a server --- but it's not worth my time to change it on a
devel machine.)

regards, tom lane

#39Greg Stark
gsstark@mit.edu
In reply to: Martijn van Oosterhout (#36)
Re: PostgreSQL 8.0.6 crash

Martijn van Oosterhout <kleptog@svana.org> writes:

Disabling overcommit has a serious cost in that most of your VM will
never be used. Are people really suggesting that I can't run a few
daemons, X and a web-browser on FreeBSD without allocating 3 times my
physical memory in swap?

There's a possibility you're misreading your process info there. The X server
maps large areas of graphics memory in its address space which shows up as
virtual memory used in ps on some OSes. I'm not sure if BSD is included there.
That's not memory that can ever be swapped out and it doesn't take up any
memory from the non-video memory.

That said, it's true that some applications allocate much more memory than
needed. There's a bit of a feedback loop here. Because application writers
know that OSes overcommit they don't worry about avoiding unnecessary
allocations. Sun's original Java memory management system just started off
with allocating 20MB before it ran a single bytecode for example.

That's why merely allocating tons of swap doesn't necessarily protect you.
It's still possible for a process (or several processes if you allocate more
swap than you have address space) to mmap gigabytes of memory without touching
it and then start touching those pages. Hopefully the OOM killer targets the
offender but there's no real way for it to guarantee it.

--
greg

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#39)
Re: PostgreSQL 8.0.6 crash

Greg Stark <gsstark@mit.edu> writes:

That's why merely allocating tons of swap doesn't necessarily protect you.
It's still possible for a process (or several processes if you allocate more
swap than you have address space) to mmap gigabytes of memory without touching
it and then start touching those pages.

So? If the swap exists to back that memory, there's no problem. It
might be slow, but it will not fail.

regards, tom lane

#41Jim C. Nasby
jnasby@pervasive.com
In reply to: Mark Woodward (#37)
Re: PostgreSQL 8.0.6 crash

On Fri, Feb 10, 2006 at 12:16:04PM -0500, Mark Woodward wrote:

And even when PostgreSQL has the server all to itself, having a hashagg
spill to disk is *way* better than pushing the machine into a swap
storm. At least if you spill the hashagg you only have one backend
running at a snail's pace; a swap storm means next to nothing gets done.

This was/is an example of where the behavior of PostgreSQL is clearly
unacceptable. OK, yes, this problem goes away with an ANALYZE, but it
isn't clear how anyone could have known this, and unexpected behavior is
bad in any product.

Care to submit a documentation patch before releases are bundled (I
think on Sunday?) At least then people would be aware that work_mem is
just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll
have time before the release. :(

I would be glad too. What's the process?

Well, find the appropriate file in doc/src/sgml, make a copy, edit the
file, generate a diff with diff -u, and email that diff/patch to
pgsql-patches.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#42Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#40)
Re: PostgreSQL 8.0.6 crash

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

That's why merely allocating tons of swap doesn't necessarily protect you.
It's still possible for a process (or several processes if you allocate more
swap than you have address space) to mmap gigabytes of memory without touching
it and then start touching those pages.

So? If the swap exists to back that memory, there's no problem. It
might be slow, but it will not fail.

Sure, but there's no way to know how much swap you need. No matter how much
swap you allocate these processes can allocate more pages of untouched RAM and
then blow up.

Of course realistically allocating 4G of swap is enough to deal with something
like Postgres where you're not being maliciously attacked. One process on ia32
can't accidentally allocate more than 4G of ram.

I was just trying to clarify the situation since someone made some comment
about it having to do with memory being swapped out and then finding nowhere
to swap in when needed. That's not exactly what's happening.

--
greg

#43Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Greg Stark (#42)
Re: PostgreSQL 8.0.6 crash

Greg Stark wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

That's why merely allocating tons of swap doesn't necessarily protect you.
It's still possible for a process (or several processes if you allocate more
swap than you have address space) to mmap gigabytes of memory without touching
it and then start touching those pages.

So? If the swap exists to back that memory, there's no problem. It
might be slow, but it will not fail.

Sure, but there's no way to know how much swap you need. No matter how much
swap you allocate these processes can allocate more pages of untouched RAM and
then blow up.

Of course realistically allocating 4G of swap is enough to deal with something
like Postgres where you're not being maliciously attacked. One process on ia32
can't accidentally allocate more than 4G of ram.

I was just trying to clarify the situation since someone made some comment
about it having to do with memory being swapped out and then finding nowhere
to swap in when needed. That's not exactly what's happening.

I guess the fundamental issue is whether Linux requires all mmap()'ed
file contents to be in memory, or whether it pushes data to disk and
unmaps it as it runs low on memory. I don't know the answer to that.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#42)
Re: PostgreSQL 8.0.6 crash

Greg Stark <gsstark@mit.edu> writes:

I was just trying to clarify the situation since someone made some comment
about it having to do with memory being swapped out and then finding nowhere
to swap in when needed. That's not exactly what's happening.

No. I believe the case that is actually hard for the kernel to predict
comes from copy-on-write: when a process forks, you could potentially
need twice its current memory image, but in reality you probably won't
ever need that much since many of the shared pages won't ever be written
by either process. However, a non-overcommitting kernel must assume
that worst case, and hence fail the fork() if it doesn't have enough
swap space to cover both processes. If it does not, then the crunch
comes when one process does touch a shared page. If there is no
available swap space at that time, kill -9 is the only recourse, because
there is no way in the Unix API to fail a write to valid memory.

The reason for having a lot more swap space than you really need is just
to cover the potential demand from copy-on-write of pages that are
currently shared. But given the price of disk these days, it's pretty
cheap insurance.

regards, tom lane