multiple runs of the same query cause out of memory - WAS [Re: capturing/viewing sort_mem utilization on a per query basis]
List & Tom,
Following up on this thread from a weeks ago:
http://archives.postgresql.org/pgsql-general/2005-02/msg00106.php
We have found that this query can indeed cause terrible things to
happen to postgresql - it can run several times with response times of
a few seconds, and then the next time cause postgres to go out of
control on memory usage. (I've had to reboot my machine twice to get
control). On a beefier machine with more memory and two CPUs, the
query worked okay for a few dozen repetitions (with a few in
parallel), then eventually one instance would cause the postgres
process to spiral out of control and consume more
and more memory at the rate of a megabyte every second or so.
I'm not quite sure where to go from here, but this is definitely
reproducable now. Help?!
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org
Lonni J Friedman <netllama@gmail.com> writes:
We have found that this query can indeed cause terrible things to
happen to postgresql - it can run several times with response times of
a few seconds, and then the next time cause postgres to go out of
control on memory usage. (I've had to reboot my machine twice to get
control). On a beefier machine with more memory and two CPUs, the
query worked okay for a few dozen repetitions (with a few in
parallel), then eventually one instance would cause the postgres
process to spiral out of control and consume more
and more memory at the rate of a megabyte every second or so.
I'm not quite sure where to go from here, but this is definitely
reproducable now. Help?!
That's a bit hard to believe --- if nothing is changing, the query
should get processed the same way every time.
Can you package up a test case for other people to look at?
regards, tom lane
On Wed, 23 Feb 2005 19:56:56 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lonni J Friedman <netllama@gmail.com> writes:
We have found that this query can indeed cause terrible things to
happen to postgresql - it can run several times with response times of
a few seconds, and then the next time cause postgres to go out of
control on memory usage. (I've had to reboot my machine twice to get
control). On a beefier machine with more memory and two CPUs, the
query worked okay for a few dozen repetitions (with a few in
parallel), then eventually one instance would cause the postgres
process to spiral out of control and consume more
and more memory at the rate of a megabyte every second or so.I'm not quite sure where to go from here, but this is definitely
reproducable now. Help?!That's a bit hard to believe --- if nothing is changing, the query
should get processed the same way every time.
Its definitely not the same every time after this additional testing.
Can you package up a test case for other people to look at?
I could try, but i'm not sure what you'd need. Right now, i can only
replicate this behavior with a snapshot of this one customer's DB.
Running on the same schema, with little or no data, it always
completes fine, so it looks to be partially data driven in nature, or
perhaps its the volume of data.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org
On Thu, 24 Feb 2005 22:46:02 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lonni J Friedman <netllama@gmail.com> writes:
Oh, also, do you see a time difference in running this query with &
without the genetic optimizer?Yeah, but that's no surprise.
The reason I couldn't replicate your problem was I was trying 8.0.
What I've found so far is that the 8.0 GEQO consistently finds a good
plan while 7.4 usually finds a bad to awful plan :-(. In about 30 tries
with 7.4, I got two OOMs and two that I gave up on after upwards of 20
minutes runtime. The runtimes of the rest were all over the map ---
up to 30 times slower than the plan found with geqo off.In about 200 tries, 8.0 all but once found a plan no worse than 50%
slower than what it found with geqo off; and even the outlier was only
3x slower than the best plan.So at least on this example, the GEQO tweaking we did for 8.0 really
paid off.As long as you're on 7.4, disabling GEQO may be a good answer. Rather
than just setting geqo = off, I'd advise bumping geqo_threshold up a
notch or two. That will allow this query to be planned by the regular
planner, without buying into doing impossibly complex queries that way.
Thanks for your help on this. Immediately, we've disabled the genetic
optimizer, and we'll tinker with increaseing the threashold parameter
to see if that produces more stability.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand http://netllama.linux-sxs.org
Import Notes
Reply to msg id not found: 524.1109303162@sss.pgh.pa.us