PG 7.0 is 2.5 times slower running a big report

Started by Bryan Whitealmost 26 years ago21 messagesgeneral
Jump to latest
#1Bryan White
bryan@arcamax.com

I have a large report that I run once a day. Under 6.5.3 it took just over
3hrs to run. Under 7.0 it is now taking 8 hours to run. No other changes
were made.

This is on RedHat Linux 6.2. A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM
disks. One disk contains the Postgres directroy including the data
directory, and the other disk has everything else.

The Postmaster is started with these options: -i -B 1024 -N 256 -o -F

The report is being run on a backup server just after the database has been
loaded from a dump and 'vacuum analyse'd. There is practically nothing else
running on the box during the run.

The report creates four separate concurrent cursors. Each of the queries is
sorted by a customer number which is an index. The index is unique in the
customer table but not the others. For the other cursors it pops values as
needed to process data for the current customer number. There are also
other selects that are run for each customer order processed to retrieve its
line items. The report does not update the database at all, it is just
accumulating totals that will be written out when the report finishes.

Top tells me the front end process is using 5 to 10 percent of the CPU and
the back end is using 10 to 20 percent. The load average is about 1.0 and
the CPU is about 80% idle. I am prettry certain on 6.5.3 that the CPU usage
was much higher. Its almost like the new version has some sort of throttle
to keep one backend from saturating the system. Indeed the box is much more
responsive than it used to be while running this report.

Suggestions?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

Top tells me the front end process is using 5 to 10 percent of the CPU and
the back end is using 10 to 20 percent. The load average is about 1.0 and
the CPU is about 80% idle.

It's probably waiting for disk I/O ...

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

regards, tom lane

#3Joseph Shraibman
jks@selectacast.net
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

Try putting 'nofsync' in your pg_options.

Bryan White wrote:

Show quoted text

I have a large report that I run once a day. Under 6.5.3 it took just over
3hrs to run. Under 7.0 it is now taking 8 hours to run. No other changes
were made.

This is on RedHat Linux 6.2. A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM
disks. One disk contains the Postgres directroy including the data
directory, and the other disk has everything else.

The Postmaster is started with these options: -i -B 1024 -N 256 -o -F

The report is being run on a backup server just after the database has been
loaded from a dump and 'vacuum analyse'd. There is practically nothing else
running on the box during the run.

The report creates four separate concurrent cursors. Each of the queries is
sorted by a customer number which is an index. The index is unique in the
customer table but not the others. For the other cursors it pops values as
needed to process data for the current customer number. There are also
other selects that are run for each customer order processed to retrieve its
line items. The report does not update the database at all, it is just
accumulating totals that will be written out when the report finishes.

Top tells me the front end process is using 5 to 10 percent of the CPU and
the back end is using 10 to 20 percent. The load average is about 1.0 and
the CPU is about 80% idle. I am prettry certain on 6.5.3 that the CPU usage
was much higher. Its almost like the new version has some sort of throttle
to keep one backend from saturating the system. Indeed the box is much more
responsive than it used to be while running this report.

Suggestions?

#4Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

Try putting 'nofsync' in your pg_options.

I am already starting the postmaster with -o -F which I believe has the same
effect. Besides this process does not alter the database so there shold be
nothing to sync.

#5The Hermit Hacker
scrappy@hub.org
In reply to: Bryan White (#4)
Re: PG 7.0 is 2.5 times slower running a big report

I might have missed it, but did you submit the EXPLAIN results of the
query you are attempting, as Tom Lane requested?

On Wed, 24 May 2000, Bryan White wrote:

Try putting 'nofsync' in your pg_options.

I am already starting the postmaster with -o -F which I believe has the same
effect. Besides this process does not alter the database so there shold be
nothing to sync.

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#6Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

Top tells me the front end process is using 5 to 10 percent of the CPU

and

the back end is using 10 to 20 percent. The load average is about 1.0

and

the CPU is about 80% idle.

It's probably waiting for disk I/O ...

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

explain select custid, poregdate, firstcontactdate,
mastersubscribed, offersubscribed, bouncecount
from customer order by custid;
Sort (cost=598354.56..598354.56 rows=2446621 width=40)
-> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40)

explain select custid, orderid, date, leadsource,
paymentstatus, shipping + tax
from orders order by custid;
Sort (cost=167945.80..167945.80 rows=588242 width=60)
-> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60)

explain select custid, action, offer, date, source
from contact order by custid;
Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635
width=44)

explain select custid, listid
from custlist order by custid;
Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543
width=8)

I find the 'Sort's on customer and orders supprising.
Here are the index creates from a dump:
CREATE INDEX "icusln" on "customer" using btree ( "lname" "text_ops" );
CREATE UNIQUE INDEX "icusem2" on "customer" using btree ( "email"
"text_ops" );
CREATE INDEX "icusph" on "customer" using btree ( "phone" "text_ops" );
CREATE UNIQUE INDEX "icusid" on "customer" using btree ( "custid"
"int4_ops" );

CREATE INDEX "iordldsrc3" on "orders" using btree ( "leadsource"
"text_ops" );
CREATE UNIQUE INDEX "iordid3" on "orders" using btree ( "orderid"
"int4_ops" );
CREATE INDEX "iordcus3" on "orders" using btree ( "custid" "int4_ops",
"date" "date_ops" );
CREATE INDEX "iorddate3" on "orders" using btree ( "date" "date_ops" );

The iordcus3 index has a second component that is irrelevent to this
operation. Is the optimizer intelligent enough to still use it.

#7Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

I can shuffle directories and reload under 6.5.3 but I need to wait until
todays run of the report finishes. I got bit by the change in the default
date format and had to run it again. As it is I won't be able to do this
until the morning.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bryan White (#6)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

explain select custid, poregdate, firstcontactdate,
mastersubscribed, offersubscribed, bouncecount
from customer order by custid;
Sort (cost=598354.56..598354.56 rows=2446621 width=40)
-> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40)

explain select custid, orderid, date, leadsource,
paymentstatus, shipping + tax
from orders order by custid;
Sort (cost=167945.80..167945.80 rows=588242 width=60)
-> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60)

explain select custid, action, offer, date, source
from contact order by custid;
Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635
width=44)

explain select custid, listid
from custlist order by custid;
Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543
width=8)

I find the 'Sort's on customer and orders supprising.

Well, in fact you should not, because 6.5.* didn't know how to use an
index to implement ORDER BY at all ;-). 7.0 knows how to do it both
ways, and so the real question here is whether the planner's cost
estimates for the two ways are accurate or not.

My guess is that the index-based ORDER BYs are actually a lot slower
than sort-based for your tables, and that's why 7.0 is slower than 6.5.
But the planner thinks they will be faster in two of these cases.
We need to figure out why it's making the wrong choice.

You can force sort-based or index-based ordering by issuing
SET enable_indexscan = OFF;
or
SET enable_sort = OFF;
respectively. I suggest that the next step should be to see what
EXPLAIN says for all four queries in both cases (so we can see what
the planner's estimates for the two cases actually are for each
table), and then to measure the actual runtimes of each of the
SELECTs both ways.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bryan White (#7)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

I can shuffle directories and reload under 6.5.3 but I need to wait until
todays run of the report finishes. I got bit by the change in the default
date format and had to run it again. As it is I won't be able to do this
until the morning.

Don't worry about it. Now that I've seen the queries, I know that there
is only one query plan 6.5.3 could have used: seq-scan and sort. So
there's no need to reload 6.5.3 just to see its EXPLAIN output.

regards, tom lane

#10Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

You can force sort-based or index-based ordering by issuing
SET enable_indexscan = OFF;
or
SET enable_sort = OFF;
respectively. I suggest that the next step should be to see what
EXPLAIN says for all four queries in both cases (so we can see what
the planner's estimates for the two cases actually are for each
table), and then to measure the actual runtimes of each of the
SELECTs both ways.

Here is to comparison:
explain select ... from customer order by custid;
Sort (cost=598354.56..598354.56 rows=2446621 width=40)
-> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40)
Index Scan using icusid on customer (cost=0.00..823287.37 rows=2446621
width=40)
Choice was sort

explain select ... from orders order by custid;
Sort (cost=167945.80..167945.80 rows=588242 width=60)
-> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60)
Index Scan using iordcus3 on orders (cost=0.00..326167.12 rows=588242
width=60)
Choice was sort

explain select ... from contact order by custid;
Sort (cost=1874247.64..1874247.64 rows=6462635 width=44)
-> Seq Scan on contact (cost=0.00..141404.35 rows=6462635 width=44)
Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635
width=44)
Choice was index

explain select ... from custlist order by custid;
Sort (cost=469342.83..469342.83 rows=2738543 width=8)
-> Seq Scan on custlist (cost=0.00..42109.43 rows=2738543 width=8)
Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543
width=8)
Choice was index

#11Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

What does EXPLAIN tell you about how the queries are being executed?
Do you by any chance have the 6.5.3 system still available to compare
its EXPLAIN output?

I can shuffle directories and reload under 6.5.3 but I need to wait

until

todays run of the report finishes. I got bit by the change in the

default

date format and had to run it again. As it is I won't be able to do

this

until the morning.

Don't worry about it. Now that I've seen the queries, I know that there
is only one query plan 6.5.3 could have used: seq-scan and sort. So
there's no need to reload 6.5.3 just to see its EXPLAIN output.

I have set index_scan off for tomorrow morning's run. I will let you know
what happens.

#12Bryan White
bryan@arcamax.com
In reply to: Bryan White (#11)
Re: PG 7.0 is 2.5 times slower running a big report

Well, when you have 2.7 million records in a database, the code might be

as good

as it can be.

I have recoverd the performance lost when I moved to Postgres 7.0 by
executing
SET enable_indexscan = OFF before creating my cursors and turning it back on
for the inner loop query. It may even be faster then before so I am happy.

So each run of the log takes 2.7 million queries.

Is there no way to use a where clause to limit the scope of your queries?

No, the point is the resulting report covers the entire database.

An explanation of the purpose of these queries would help me think about

what

you are after. Maybe it isn't a code problem, may it a business rules

problem.

At the moment, I understand why you need to go through all the items in

the db

on every day. Can't you just go through the items that were updated on a

given

day? So, in general, what is the goal of this report?

The report summarizes a lot of different aspects of our database. It breaks
down totals in many by a variety of variables. For example orders are
broken down by source, date, status, and sku (most orders have only one line
item). I could come up with quicker methods to do any one of the breakdowns
but to do them all it is more efficient to make a single pass over all the
data.

I'm new to the cursor method (and to PostgreSQL, though I've done a lot of

work

with MySQL), but it is interesting, so I'm certainly going to look into

it.

Part of the advantage of the cursor is in memory management. Is a
non-cursor select the database builds and transfers the entire result set to
the front end application. For large queries this can be quite a chunk of
memory.

With a sequential scan and sort of the database I think it is still
buffering the result set on the back end but one copy is better then two.
Keep in mind I am running this report on a backup database server so I don't
have to worry much about other processes being hurt by the load.

It seems that with index scans the cursors start producing data right away
(but the overall rate is slower). With sequential scan and sort the report
gets no data for the first 30 minutes and then runs at about 4 times the
rate of the index scan.

What difference do you want to capture in distinguishing the customer

table from

the custlist table?

I am not certain I understand the question. The custlist table contains
subscriptions to various services. It is a very simple table containing a
listid and a custid.

#13Bryan White
bryan@arcamax.com
In reply to: Bryan White (#1)
Re: PG 7.0 is 2.5 times slower running a big report

I have set index_scan off for tomorrow morning's run. I will let you know
what happens.

I think my problem is fixed. By disabling index scan while creating the
cursors and then turning it back on again for the small query that occurs in
my inner loop the performance has returned to normal (It may infact be
better then before).

Thanks a bunch.
Bryan

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bryan White (#12)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

I have recoverd the performance lost when I moved to Postgres 7.0 by
executing SET enable_indexscan = OFF before creating my cursors and
turning it back on for the inner loop query. It may even be faster
then before so I am happy.

OK, so it was the indexscans that were hurting. (7.0 has new sorting
code too, so I was a little afraid that the problem might be with the
sorts. Evidently not.)

This suggests that at least on your setup, the default value of 4.0 for
random_page_cost might still be too low. I have not tried to measure
that number on a Linux machine, just on machines with BSD-derived
filesystems. Maybe Linux does a lot worse with random accesses than
BSD? Needs looking into.

It seems that with index scans the cursors start producing data right away
(but the overall rate is slower). With sequential scan and sort the report
gets no data for the first 30 minutes and then runs at about 4 times the
rate of the index scan.

Right, that's what you'd expect: the sort has to be completed before it
knows which row to deliver first, but an indexscan has no such startup
cost.

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
Re: PG 7.0 is 2.5 times slower running a big report

"Bryan White" <bryan@arcamax.com> writes:

I have recoverd the performance lost when I moved to Postgres 7.0 by
executing SET enable_indexscan = OFF before creating my cursors and
turning it back on for the inner loop query. It may even be faster
then before so I am happy.

OK, so it was the indexscans that were hurting. (7.0 has new sorting
code too, so I was a little afraid that the problem might be with the
sorts. Evidently not.)

This suggests that at least on your setup, the default value of 4.0 for
random_page_cost might still be too low. I have not tried to measure
that number on a Linux machine, just on machines with BSD-derived
filesystems. Maybe Linux does a lot worse with random accesses than
BSD? Needs looking into.

As I remember, Linux turns off file system prefetch if a seek is done.
BSD file systems turn off prefetch only if prefetched blocks remain
unused.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#16Matthias Urlichs
smurf@noris.de
In reply to: Bruce Momjian (#15)
Re: PG 7.0 is 2.5 times slower running a big report

Hi,

Bruce Momjian:

As I remember, Linux turns off file system prefetch if a seek is done.

Correct (still so for Linux 2.4).

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

"Unused" is decided when?
Do any, all, or some percentage of prefetched blocks need to be unused?

--
Matthias Urlichs | noris network GmbH | smurf@noris.de | ICQ: 20193661
The quote was selected randomly. Really. | http://smurf.noris.de/
--
If hyperspace did not already exist, science fiction writers would have had to
invent it
-Peter Oakley

#17Alfred Perlstein
bright@wintelcom.net
In reply to: Matthias Urlichs (#16)
Re: PG 7.0 is 2.5 times slower running a big report

* Matthias Urlichs <smurf@noris.de> [000526 02:47] wrote:

Hi,

Bruce Momjian:

As I remember, Linux turns off file system prefetch if a seek is done.

Correct (still so for Linux 2.4).

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

"Unused" is decided when?
Do any, all, or some percentage of prefetched blocks need to be unused?

The statement:

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

is false for FreeBSD afaik.

FreeBSD tracks how often sequential access happens and will retry it
as the file's access trend indicates. It can flip back and forth from
readahead and non-readahead.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#18Bruce Momjian
bruce@momjian.us
In reply to: Matthias Urlichs (#16)
Re: PG 7.0 is 2.5 times slower running a big report

Hi,

Bruce Momjian:

As I remember, Linux turns off file system prefetch if a seek is done.

Correct (still so for Linux 2.4).

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

"Unused" is decided when?
Do any, all, or some percentage of prefetched blocks need to be unused?

It starts out prefetching blocks on first file read. In later reads, if
the desired block is in the cache, it continues read-ahead and may even
scale up the read-ahead window.

If the desired block is not in the cache, it starts to think it is
reading ahead too much or random reads are being done, and scales back
the read-ahead window.

It knows if the current read is sequential from from the previous read,
and can scale up the read-ahead window in those cases.

So the case where a file is being read sequentially but in two sections
at the same time should work fine because the read-aheads are being
used.

It basically a feedback loop causing a read that is sequential from a
previous read to make the read-ahead go up, and a read that is not in
the cache to make the read-ahead go down.

At least that is how I am reading the BSDI code. I could send a bit of
the code over, but it is pretty confusing.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19Bruce Momjian
bruce@momjian.us
In reply to: Alfred Perlstein (#17)
Re: PG 7.0 is 2.5 times slower running a big report

The statement:

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

is false for FreeBSD afaik.

FreeBSD tracks how often sequential access happens and will retry it
as the file's access trend indicates. It can flip back and forth from
readahead and non-readahead.

Yes, read my later posting describing the actual algorithm. Let me know
if it is wrong.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Bill Barnes
bbarnes@operamail.com
In reply to: Bruce Momjian (#19)
RE: PG 7.0 is 2.5 times slower running a big report

Might I suggest a change in the name of this thread or starting a new one.
The subject problem was resolved sometime ago and this is just bad publicity.
Almost scare me off till I read the user with the problem was now happy.

Bill Barnes

===== Original Message From Bruce Momjian <pgman@candle.pha.pa.us> =====

The statement:

BSD file systems turn off prefetch only if prefetched blocks remain
unused.

is false for FreeBSD afaik.

FreeBSD tracks how often sequential access happens and will retry it
as the file's access trend indicates. It can flip back and forth from
readahead and non-readahead.

Yes, read my later posting describing the actual algorithm. Let me know
if it is wrong.

--
Bruce Momjian                        |  http://www.op.net/~candle
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

------------------------------------------------------------
This e-mail has been sent to you courtesy of OperaMail, a
free web-based service from Opera Software, makers of
the award-winning Web Browser - http://www.operasoftware.com
------------------------------------------------------------

#21Bryan White
bryan@arcamax.com
In reply to: Bill Barnes (#20)