7.4.1 upgrade issues

Started by Gavin M. Royabout 22 years ago11 messagesgeneral
Jump to latest
#1Gavin M. Roy
gmr@ehpg.net

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.

TIA,
Gavin

#2Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Gavin M. Roy (#1)
Re: 7.4.1 upgrade issues

On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote:

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.

By "holding up", do you mean that it's causing the other transactions
to block (INSERT WAITING, for instance), or that it's making
everything real slow?

It could be your sort_mem is set too high. Remember that the
new-in-7.4 hash behaviour works with the sort_mem setting, and if
it's set too high and you have enough cases of this, you might
actually cause your box to start swapping.

and disk speed. I've considered renicing the processes, I was wondering

That is unlikely to help, and certainly won't if the queries are
actually blocked.

--
Andrew Sullivan | ajs@crankycanuck.ca
The plural of anecdote is not data.
--Roger Brinner

#3Mike Mascari
mascarm@mascari.com
In reply to: Gavin M. Roy (#1)
Re: 7.4.1 upgrade issues

Gavin M. Roy wrote:

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.

Hi Gavin.

Assuming a VACUUM ANALYZE after reload, one possibility is that the
query in question contains >= 11 joins. I forgot to adjust the GEQO
settings during an upgrade and experienced the associated
sluggishness in planning time.

Mike Mascari

#4Gavin M. Roy
gmr@ehpg.net
In reply to: Andrew Sullivan (#2)
Re: 7.4.1 upgrade issues

It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.
My sort mem is fairly high, but this is a dedicated box, and there is no
swapping going on afaik,

Gavin

Andrew Sullivan wrote:

Show quoted text

On Sat, Mar 06, 2004 at 01:12:57PM -0800, Gavin M. Roy wrote:

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.

By "holding up", do you mean that it's causing the other transactions
to block (INSERT WAITING, for instance), or that it's making
everything real slow?

It could be your sort_mem is set too high. Remember that the
new-in-7.4 hash behaviour works with the sort_mem setting, and if
it's set too high and you have enough cases of this, you might
actually cause your box to start swapping.

and disk speed. I've considered renicing the processes, I was wondering

That is unlikely to help, and certainly won't if the queries are
actually blocked.

#5Gavin M. Roy
gmr@ehpg.net
In reply to: Mike Mascari (#3)
Re: 7.4.1 upgrade issues

It is using indexs, and not seqscan, and there was an analyze after
reload... I'll play with GEQO, thanks.

Gavin

Mike Mascari wrote:

Show quoted text

Gavin M. Roy wrote:

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds
to reply is holding up all other backends from performing their
queries. Once the big query is finished, all the tiny ones fly
through. This is seemingly ne behavior on the box, as with previous
versions things would slow down, but not wait for the cpu/resource
hog queries to finish. The box is Slackware 8.1, on a fairly decent
box with plenty of ram, cpu, and disk speed. I've considered
renicing the processes, I was wondering if anyone had a different
suggestion.

Hi Gavin.

Assuming a VACUUM ANALYZE after reload, one possibility is that the
query in question contains >= 11 joins. I forgot to adjust the GEQO
settings during an upgrade and experienced the associated sluggishness
in planning time.

Mike Mascari

#6Jim Wilson
jimw@kelcomaine.com
In reply to: Gavin M. Roy (#5)
Re: 7.4.1 upgrade issues

"Gavin M. Roy" said:

I upgraded my main production db from 7.3.4 last night to 7.4.1. I'm
running into an issue where a big query that may take 30-40 seconds to
reply is holding up all other backends from performing their queries.
Once the big query is finished, all the tiny ones fly through. This is
seemingly ne behavior on the box, as with previous versions things would
slow down, but not wait for the cpu/resource hog queries to finish. The
box is Slackware 8.1, on a fairly decent box with plenty of ram, cpu,
and disk speed. I've considered renicing the processes, I was wondering
if anyone had a different suggestion.

It sounds like you are suggesting this same system and data worked fine on
7.3.4. Just the same, you might want to provide more detail anyway. EIDE
drives when used (not recommended for servers IMO) are often not configured
properly and can cause similar issues in a system with tons of ram and cpu.

Best,

Jim

--
Jim Wilson - IT Manager
Kelco Industries
PO Box 160
58 Main Street
Milbridge, ME 04658
207-546-7989 - FAX 207-546-2791
http://www.kelcomaine.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin M. Roy (#4)
Re: 7.4.1 upgrade issues

"Gavin M. Roy" <gmr@ehpg.net> writes:

It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.

Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)

regards, tom lane

#8Gavin M. Roy
gmr@ehpg.net
In reply to: Tom Lane (#7)
Re: 7.4.1 upgrade issues

I'll post it if you want, but the issue isn't with the optimizer, index
usage, or seq scan, the issue seems to be more revolving around the
backend getting so much cpu priority it's not allowing other backends to
process, or something along those lines. For the hardware question
asked, it's an adaptec 7899 Ultra 160 SCSI card w/ accompanying fast
drives...

Again, I'll send the explain, etc if you think it would help answer my
question, but from my perspective, the amount of time the query takes to
execute isnt my issue, but the fact that nothing else can seemingly
execute while its running.

Gavin

Tom Lane wrote:

Show quoted text

"Gavin M. Roy" <gmr@ehpg.net> writes:

It's not WAITING, the larger queries are eating cpu (99%) and the rest
are running so slow it would seem they're waitng for processing time.

Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin M. Roy (#8)
Re: 7.4.1 upgrade issues

"Gavin M. Roy" <gmr@ehpg.net> writes:

... the issue seems to be more revolving around the
backend getting so much cpu priority it's not allowing other backends to
process, or something along those lines.

I can't think of any difference between 7.3 and 7.4 that would create
a problem of that sort where there was none before. For that matter,
since Postgres runs nonprivileged it's hard to see how it could create
a priority problem in any version. I thought the previous suggestion
about added use of hashtables was a pretty good idea. We could
confirm or disprove it by looking at EXPLAIN output.

regards, tom lane

#10Gavin M. Roy
gmr@ehpg.net
In reply to: Gavin M. Roy (#1)
Re: 7.4.1 upgrade issues

Thanks, I'll take a look, we've rewritten the queries and indexes to
avoid the issue, but I'd like to get an ultimate solution to the issue,
and the concept that it's a linux kernel scheduling thing is probably
dead on.

Gavin

Karl O. Pinc wrote:

Show quoted text

This reminds me of the scheduler optimizations that have been flying
around the Linux kernel deveopment over the last year or so. There are
cases apparently where this kind of behavior can come up. IIRC it's
fixed in later kernels but don't take my word for it, I'm just writing
to give a heads-up. Take a look at the Linux kernel mailing list,
and you'll probably find good articles at Linux Weekly News (lwn.net.)

On 2004.03.06 23:32 Gavin M. Roy wrote:

I'll post it if you want, but the issue isn't with the optimizer,
index usage, or seq scan, the issue seems to be more revolving around
the backend getting so much cpu priority it's not allowing other
backends to process, or something along those lines. For the
hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/
accompanying fast drives...

Again, I'll send the explain, etc if you think it would help answer
my question, but from my perspective, the amount of time the query
takes to execute isnt my issue, but the fact that nothing else can
seemingly execute while its running.

Gavin

Tom Lane wrote:

"Gavin M. Roy" <gmr@ehpg.net> writes:

It's not WAITING, the larger queries are eating cpu (99%) and the
rest are running so slow it would seem they're waitng for
processing time.

Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#11Karl O. Pinc
kop@meme.com
In reply to: Gavin M. Roy (#8)
Re: 7.4.1 upgrade issues

This reminds me of the scheduler optimizations that have been flying
around the Linux kernel deveopment over the last year or so. There are
cases apparently where this kind of behavior can come up. IIRC it's
fixed in later kernels but don't take my word for it, I'm just writing
to give a heads-up. Take a look at the Linux kernel mailing list,
and you'll probably find good articles at Linux Weekly News (lwn.net.)

On 2004.03.06 23:32 Gavin M. Roy wrote:

I'll post it if you want, but the issue isn't with the optimizer,
index usage, or seq scan, the issue seems to be more revolving around
the backend getting so much cpu priority it's not allowing other
backends to process, or something along those lines. For the
hardware question asked, it's an adaptec 7899 Ultra 160 SCSI card w/
accompanying fast drives...

Again, I'll send the explain, etc if you think it would help answer
my question, but from my perspective, the amount of time the query
takes to execute isnt my issue, but the fact that nothing else can
seemingly execute while its running.

Gavin

Tom Lane wrote:

"Gavin M. Roy" <gmr@ehpg.net> writes:

It's not WAITING, the larger queries are eating cpu (99%) and the
rest are running so slow it would seem they're waitng for
processing time.

Could we see EXPLAIN ANALYZE output for the large query? (Also the
usual supporting evidence, ie table schemas for all the tables
involved.)

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein