BUG #6275: Horrible performance regression

Started by Finlay Thompsonover 14 years ago7 messagesbugs
Jump to latest
#1Finlay Thompson
finlay@dragonfly.co.nz

The following bug has been logged online:

Bug reference: 6275
Logged by: Finlay Thompson
Email address: finlay@dragonfly.co.nz
PostgreSQL version: 8.4.9
Operating system: Ubuntu 11.04
Description: Horrible performance regression
Details:

After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
3 hours to not finishing after 24 hours.

Context: We have a continuous build script, that loads, grooms, matches data
every few hours. The script has been run over 100 times in the last few
weeks, and has steadily been taking around 3 hours. It is really a group of
scripts, something like 15000 lines of sql, and some python scripts to run
it together.

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

I spent about a day trying to diagnose the problem, with some help from
#postgresql, and ran through the script step by step, but was unable to find
any specific issue.

Today I reinstalled the 8.4.8 version, and it built the database in 3:02
hours, as before.

From this, I reckon something is broken in the 8.4.9 release.

I would love to isolate the problem, and am happy to try anything people
suggest.

Sorry I can't be more specific about the problem, but I thought I should
report the issue.

btw: The database uses plpython and postgis.

Regards,
Finlay Thompson

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Finlay Thompson (#1)
Re: BUG #6275: Horrible performance regression

"Finlay Thompson" <finlay@dragonfly.co.nz> writes:

After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
3 hours to not finishing after 24 hours.

Context: We have a continuous build script, that loads, grooms, matches data
every few hours. The script has been run over 100 times in the last few
weeks, and has steadily been taking around 3 hours. It is really a group of
scripts, something like 15000 lines of sql, and some python scripts to run
it together.

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

Well, the most likely theory seems to be that one of the planner changes
in 8.4.9 is resulting in a much less optimal plan choice for one or more
queries, but it's impossible to diagnose it on the basis of this much
information.

btw: The database uses plpython and postgis.

Hmm, did you change postgis versions at the same time? If so, which
upgrade caused the problem, postgres or postgis?

regards, tom lane

#3Finlay Thompson
finlay@dragonfly.co.nz
In reply to: Tom Lane (#2)
Re: BUG #6275: Horrible performance regression

Hi Tom

Sorry for the delay.

The version of postgis didn't change, and when I rolled back to
postgresql 8.4.9 the expected performance returned. I am very confident
the regression was caused by the 8.4.9 upgrade.

Any suggestions on how to diagnose a long run script like this ? I can
easily turn on more logging, in both cases and compare ? Can I log the
query plans ?

Finlay

On Fri, 2011-10-28 at 00:53 -0400, Tom Lane wrote:

"Finlay Thompson" <finlay@dragonfly.co.nz> writes:

After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
3 hours to not finishing after 24 hours.

Context: We have a continuous build script, that loads, grooms, matches data
every few hours. The script has been run over 100 times in the last few
weeks, and has steadily been taking around 3 hours. It is really a group of
scripts, something like 15000 lines of sql, and some python scripts to run
it together.

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

Well, the most likely theory seems to be that one of the planner changes
in 8.4.9 is resulting in a much less optimal plan choice for one or more
queries, but it's impossible to diagnose it on the basis of this much
information.

btw: The database uses plpython and postgis.

Hmm, did you change postgis versions at the same time? If so, which
upgrade caused the problem, postgres or postgis?

regards, tom lane

--
Finlay Thompson
www.dragonfly.co.nz
Dragonfly, PO Box 27535, Wellington 6141, New Zealand
Level 5, 158 Victoria Street, Te Aro, Wellington
M: +64 21 426 878
T: +64 4 385 9285

#4Robert Haas
robertmhaas@gmail.com
In reply to: Finlay Thompson (#3)
Re: BUG #6275: Horrible performance regression

On Sun, Oct 30, 2011 at 10:30 PM, Finlay Thompson
<finlay@dragonfly.co.nz> wrote:

Hi Tom

Sorry for the delay.

The version of postgis didn't change, and when I rolled back to
postgresql 8.4.9 the expected performance returned. I am very confident
the regression was caused by the 8.4.9 upgrade.

Any suggestions on how to diagnose a long run script like this ? I can
easily turn on more logging, in both cases and compare ? Can I log the
query plans ?

The easiest thing to do is probably EXPLAIN ANALYZE each query and
post the output to this thread.

If you can't do that for some reason, there's always auto_explain, but
that has some overhead and is a bit more work to set up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Guillaume Smet
guillaume.smet@gmail.com
In reply to: Robert Haas (#4)
Re: BUG #6275: Horrible performance regression

On Mon, Oct 31, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

If you can't do that for some reason, there's always auto_explain, but
that has some overhead and is a bit more work to set up.

If it's a massive load of data with a lot of queries, I think his best
bet is to log the queries with log_min_duration_statement = <something
sensible> (I'd try to start with 30ms) then analyze the log results
with pgFouine or any other log analyzer for both versions.

--
Guillaume

#6Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Finlay Thompson (#1)
Re: BUG #6275: Horrible performance regression

On 28/10/11 15:42, Finlay Thompson wrote:

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

If there is one query exhausting all ram, then it could be tricky to
catch it in the logs (log duration and friends log the query time on
completion... which might be never). If you have vm.overcommit_memory
set to 2 then you may get the offending statement in the log when the
postgres backend errors out with ENOMEM.

If you know the approx time at which memory starts filling up then
getting (most of) the query text from pg_stat_activity would enable you
to track down the offending sql in your load script.

regards

Mark

#7Finlay Thompson
finlay@dragonfly.co.nz
In reply to: Mark Kirkwood (#6)
Re: BUG #6275: Horrible performance regression

Hi Mark

Thanks for your suggestions (and Robert and Guillaume). I will try the
overcommit_memory=2 idea out in a couple of weeks. At the moment I am
running everything on 8.4.8, which is working great, and have pressing
deadlines looming (next Wednesday).

I'll let you know when I do. I want to find a test case that exhibits a
real postgresql bug, or informs me as to what I have done wrong.

Regards,
Finaly

On Tue, 2011-11-01 at 17:26 +1300, Mark Kirkwood wrote:

On 28/10/11 15:42, Finlay Thompson wrote:

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

If there is one query exhausting all ram, then it could be tricky to
catch it in the logs (log duration and friends log the query time on
completion... which might be never). If you have vm.overcommit_memory
set to 2 then you may get the offending statement in the log when the
postgres backend errors out with ENOMEM.

If you know the approx time at which memory starts filling up then
getting (most of) the query text from pg_stat_activity would enable you
to track down the offending sql in your load script.

regards

Mark

--
Finlay Thompson
www.dragonfly.co.nz
Dragonfly, PO Box 27535, Wellington 6141, New Zealand
Level 5, 158 Victoria Street, Te Aro, Wellington
M: +64 21 426 878
T: +64 4 385 9285