7.4.1 upgrade issues
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.
TIA,
Gavin
On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote:
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
By "holding up", do you mean that it's causing the other transactions
to block (INSERT WAITING, for instance), or that it's making
everything real slow?
It could be your sort_mem is set too high. Remember that the
new-in-7.4 hash behaviour works with the sort_mem setting, and if
it's set too high and you have enough cases of this, you might
actually cause your box to start swapping.
and disk speed. I've considered renicing the processes, I was wondering
That is unlikely to help, and certainly won't if the queries are
actually blocked.
--
Andrew Sullivan | ajs@crankycanuck.ca
The plural of anecdote is not data.
--Roger Brinner
Gavin M. Roy wrote:
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.
Hi Gavin.
Assuming a VACUUM ANALYZE after reload, one possibility is that the
query in question contains >= 11 joins. I forgot to adjust the GEQO
settings during an upgrade and experienced the associated
sluggishness in planning time.
Mike Mascari
It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.
My sort mem is fairly high, but this is a dedicated box, and there is no
swapping going on afaik,
Gavin
Andrew Sullivan wrote:
Show quoted text
On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote:
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.By "holding up", do you mean that it's causing the other transactions
to block (INSERT WAITING, for instance), or that it's making
everything real slow?It could be your sort_mem is set too high. Remember that the
new-in-7.4 hash behaviour works with the sort_mem setting, and if
it's set too high and you have enough cases of this, you might
actually cause your box to start swapping.and disk speed. I've considered renicing the processes, I was wondering
That is unlikely to help, and certainly won't if the queries are
actually blocked.
It is using indexs, and not seqscan, and there was an analyze after
reload... I'll play with GEQO, thanks.
Gavin
Mike Mascari wrote:
Show quoted text
Gavin M. Roy wrote:
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds
to reply is holding up all other backends from performing their
queries. Once the big query is finished, all the tiny ones fly
through. This is seemingly ne behavior on the box, as with previous
versions things would slow down, but not wait for the cpu/resource
hog queries to finish. The box is Slackware 8.1, on a fairly decent
box with plenty of ram, cpu, and disk speed. I've considered
renicing the processes, I was wondering if anyone had a different
suggestion.Hi Gavin.
Assuming a VACUUM ANALYZE after reload, one possibility is that the
query in question contains >= 11 joins. I forgot to adjust the GEQO
settings during an upgrade and experienced the associated sluggishness
in planning time.Mike Mascari
"Gavin M. Roy" said:
I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.
It sounds like you are suggesting this same system and data worked fine on
7.3.4. Just the same, you might want to provide more detail anyway. EIDE
drives when used (not recommended for servers IMO) are often not configured
properly and can cause similar issues in a system with tons of ram and cpu.
Best,
Jim
--
Jim Wilson - IT Manager
Kelco Industries
PO Box 160
58 Main Street
Milbridge, ME 04658
207-546-7989 - FAX 207-546-2791
http://www.kelcomaine.com
Import Notes
Resolved by subject fallback
"Gavin M. Roy" <gmr@ehpg.net> writes:
It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.
Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)
regards, tom lane
I'll post it if you want, but the issue isn't with the optimizer, index
usage, or seq scan, the issue seems to be more revolving around the
backend getting so much cpu priority it's not allowing other backends to
process, or something along those lines. For the hardware question
asked, it's an adaptec 7899 Ultra 160 SCSI card w/ accompanying fast
drives...
Again, I'll send the explain, etc if you think it would help answer my
question, but from my perspective, the amount of time the query takes to
execute isnt my issue, but the fact that nothing else can seemingly
execute while its running.
Gavin
Tom Lane wrote:
Show quoted text
"Gavin M. Roy" <gmr@ehpg.net> writes:
It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
"Gavin M. Roy" <gmr@ehpg.net> writes:
... the issue seems to be more revolving around the
backend getting so much cpu priority it's not allowing other backends to
process, or something along those lines.
I can't think of any difference between 7.3 and 7.4 that would create
a problem of that sort where there was none before. For that matter,
since Postgres runs nonprivileged it's hard to see how it could create
a priority problem in any version. I thought the previous suggestion
about added use of hashtables was a pretty good idea. We could
confirm or disprove it by looking at EXPLAIN output.
regards, tom lane
Thanks, I'll take a look, we've rewritten the queries and indexes to
avoid the issue, but I'd like to get an ultimate solution to the issue,
and the concept that it's a linux kernel scheduling thing is probably
dead on.
Gavin
Karl O. Pinc wrote:
Show quoted text
This reminds me of the scheduler optimizations that have been flying
around the Linux kernel deveopment over the last year or so. There are
cases apparently where this kind of behavior can come up. IIRC it's
fixed in later kernels but don't take my word for it, I'm just writing
to give a heads-up. Take a look at the Linux kernel mailing list,
and you'll probably find good articles at Linux Weekly News (lwn.net.)On 2004.03.06 23:32 Gavin M. Roy wrote:
I'll post it if you want, but the issue isn't with the optimizer,
index usage, or seq scan, the issue seems to be more revolving around
the backend getting so much cpu priority it's not allowing other
backends to process, or something along those lines. For the
hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/
accompanying fast drives...Again, I'll send the explain, etc if you think it would help answer
my question, but from my perspective, the amount of time the query
takes to execute isnt my issue, but the fact that nothing else can
seemingly execute while its running.Gavin
Tom Lane wrote:
"Gavin M. Roy" <gmr@ehpg.net> writes:
It's not WAITING, the larger queries are eating cpu (99%) and the
rest are running so slow it would seem they're waitng for
processing time.Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
Import Notes
Reply to msg id not found: 20040309134357.C29734@mofo.meme.com
This reminds me of the scheduler optimizations that have been flying
around the Linux kernel deveopment over the last year or so. There are
cases apparently where this kind of behavior can come up. IIRC it's
fixed in later kernels but don't take my word for it, I'm just writing
to give a heads-up. Take a look at the Linux kernel mailing list,
and you'll probably find good articles at Linux Weekly News (lwn.net.)
On 2004.03.06 23:32 Gavin M. Roy wrote:
I'll post it if you want, but the issue isn't with the optimizer,
index usage, or seq scan, the issue seems to be more revolving around
the backend getting so much cpu priority it's not allowing other
backends to process, or something along those lines. For the
hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/
accompanying fast drives...Again, I'll send the explain, etc if you think it would help answer
my question, but from my perspective, the amount of time the query
takes to execute isnt my issue, but the fact that nothing else can
seemingly execute while its running.Gavin
Tom Lane wrote:
"Gavin M. Roy" <gmr@ehpg.net> writes:
It's not WAITING, the larger queries are eating cpu (99%) and the
rest are running so slow it would seem they're waitng for
processing time.Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)
Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein