Performance tips

Started by Andrew Perrinabout 24 years ago22 messagesgeneral
Jump to latest
#1Andrew Perrin
andrew_perrin@unc.edu

Greetings-

The database for my current project has grown very large (four tables: 17
records, 10,000 records, 3,000,000 records, and 5,000,000 records,
respectively). Doing things with the data has, therefore, become rather
cumbersome, as operations on the large-N tables can take quite a while. I
wonder if anyone can offer tips on boosting performance? I've done the
obvious, such as building indices on the columns used in searches and
joins.

The computer is a 1Ghz PIII (IBM NetVista) running debian linux
(woody) and PostgreSQL 7.1.3. There's 512M of RAM in it, and top shows
that swap rarely gets used, so one possibility is to try to have pg keep
more workspace in RAM at once. I could also potentially buy more RAM for
the machine.

Thanks for any advice.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

#2Doug McNaught
doug@wireboard.com
In reply to: Andrew Perrin (#1)
Re: Performance tips

Andrew Perrin <andrew_perrin@unc.edu> writes:

The computer is a 1Ghz PIII (IBM NetVista) running debian linux
(woody) and PostgreSQL 7.1.3. There's 512M of RAM in it, and top shows
that swap rarely gets used, so one possibility is to try to have pg keep
more workspace in RAM at once. I could also potentially buy more RAM for
the machine.

Do try to keep it out of swap, but you may have scope for increasing
the number of shmem buffers. More RAM will always help, as will
getting more and faster disks and spreading the I/O load over them.
Take a look at 'vmstat' output and your CPU usage while you're running
a query to see where your bottlenecks might be.

Unless you have enough RAM to cache the whole thing, a database is
usually I/O bound, which means your disk subsystem is probably a good
place to improve.

Also: VACUUM ANALYZE (are you running it)? Does EXPLAIN show
reasonable plans for all your queries?

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#3Andrew Perrin
andrew_perrin@unc.edu
In reply to: Doug McNaught (#2)
Re: Performance tips

Well, here's the output from vmstat:

aperrin@hm269-26876:~/afshome/papers/authoritarian$ vmstat
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3052 2132 10460 413284 0 0 11 14 6 5 6
3 17

I can't say I understand it though.... I've got a query running through
psql that's been chugging away for nearly 2 hours now:

auth=# select count(patternid) from patterns where patternid in (select
o_patternid from
auth(# letters, pattern_occurrences where letters.letterid =
pattern_occurrences.o_letterid
auth(# and letters.datecat in (1,2));

patterns has approx. 3,000,000 records, pattern_occurrences 5,000,000,
letters 10,000, of which 8,000 or so are datecat 1 or 2.

Last time I tried to vacuum the database it was still hung 12 hours later
so I cancelled. Haven't tried vacuum analyze or explain but will do so.

Thanks,
Andy

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

On 9 Jan 2002, Doug McNaught wrote:

Show quoted text

Andrew Perrin <andrew_perrin@unc.edu> writes:

The computer is a 1Ghz PIII (IBM NetVista) running debian linux
(woody) and PostgreSQL 7.1.3. There's 512M of RAM in it, and top shows
that swap rarely gets used, so one possibility is to try to have pg keep
more workspace in RAM at once. I could also potentially buy more RAM for
the machine.

Do try to keep it out of swap, but you may have scope for increasing
the number of shmem buffers. More RAM will always help, as will
getting more and faster disks and spreading the I/O load over them.
Take a look at 'vmstat' output and your CPU usage while you're running
a query to see where your bottlenecks might be.

Unless you have enough RAM to cache the whole thing, a database is
usually I/O bound, which means your disk subsystem is probably a good
place to improve.

Also: VACUUM ANALYZE (are you running it)? Does EXPLAIN show
reasonable plans for all your queries?

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Perrin (#3)
Re: Performance tips

Andrew Perrin <andrew_perrin@unc.edu> writes:

Well, here's the output from vmstat:
aperrin@hm269-26876:~/afshome/papers/authoritarian$ vmstat
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3052 2132 10460 413284 0 0 11 14 6 5 6
3 17

That's fairly useless, since what it gives you is the average values
since system boot. To get useful numbers, do "vmstat 5" (or some other
interval, but 5 seconds usually works well), and let it run long enough
to get a page's worth of output.

I can't say I understand it though.... I've got a query running through
psql that's been chugging away for nearly 2 hours now:

auth=# select count(patternid) from patterns where patternid in (select
o_patternid from
auth(# letters, pattern_occurrences where letters.letterid =
pattern_occurrences.o_letterid
auth(# and letters.datecat in (1,2));

"WHERE ... IN (subselect)" queries are notoriously inefficient in Postgres.
It might be worth trying to recast as a WHERE ... EXISTS query. Also,
is the inner query likely to produce a lot of duplicates? If so,
changing it to a SELECT DISTINCT might help.

Last time I tried to vacuum the database it was still hung 12 hours later
so I cancelled.

Hmm, shouldn't take 12+ hours to vacuum a database with only ~8mil
records. How many indexes have you got in that thing? Some people
have been known to drop indexes, vacuum, recreate indexes.

regards, tom lane

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Andrew Perrin (#3)
Re: Performance tips

On Wed, Jan 09, 2002 at 10:37:41PM -0500, Andrew Perrin wrote:

Well, here's the output from vmstat:

aperrin@hm269-26876:~/afshome/papers/authoritarian$ vmstat
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3052 2132 10460 413284 0 0 11 14 6 5 6
3 17

I can't say I understand it though.... I've got a query running through
psql that's been chugging away for nearly 2 hours now:

Well now, that's very interesting. You may need to type "vmstat 1" so it
prints status every second, but the interesting columns are the "bi" and
"bo" columns (blocks in and blocks out). It appears that your disk system is
basically idling and the last five columns indicate that your CPU is also.
The second column indicates one process in "uninterruptable sleep", which is
bad if it stays that way.

Looks like a process was accessing an NFS mount and the server died, but
that's probably not the case. What could possibly be jamming your machine
so?

auth=# select count(patternid) from patterns where patternid in (select
o_patternid from
auth(# letters, pattern_occurrences where letters.letterid =
pattern_occurrences.o_letterid
auth(# and letters.datecat in (1,2));

patterns has approx. 3,000,000 records, pattern_occurrences 5,000,000,
letters 10,000, of which 8,000 or so are datecat 1 or 2.

Last time I tried to vacuum the database it was still hung 12 hours later
so I cancelled. Haven't tried vacuum analyze or explain but will do so.

Really, really slow disk? Are you getting any disk errors? timeouts?

HTH,

On 9 Jan 2002, Doug McNaught wrote:

Andrew Perrin <andrew_perrin@unc.edu> writes:

The computer is a 1Ghz PIII (IBM NetVista) running debian linux
(woody) and PostgreSQL 7.1.3. There's 512M of RAM in it, and top shows
that swap rarely gets used, so one possibility is to try to have pg keep
more workspace in RAM at once. I could also potentially buy more RAM for
the machine.

Do try to keep it out of swap, but you may have scope for increasing
the number of shmem buffers. More RAM will always help, as will
getting more and faster disks and spreading the I/O load over them.
Take a look at 'vmstat' output and your CPU usage while you're running
a query to see where your bottlenecks might be.

Unless you have enough RAM to cache the whole thing, a database is
usually I/O bound, which means your disk subsystem is probably a good
place to improve.

Also: VACUUM ANALYZE (are you running it)? Does EXPLAIN show
reasonable plans for all your queries?

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

Terrorists can only take my life. Only my government can take my freedom.

#6Doug McNaught
doug@wireboard.com
In reply to: Andrew Perrin (#3)
Re: Performance tips

Andrew Perrin <andrew_perrin@unc.edu> writes:

Well, here's the output from vmstat:

aperrin@hm269-26876:~/afshome/papers/authoritarian$ vmstat
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3052 2132 10460 413284 0 0 11 14 6 5 6
3 17

Sorry, I should have told you to do 'vmstat 5' which will keep
printing lines of numbers (every 5 seconds) until you interrupt it.
One line isn't too useful. But hold off on that for now, see below...

I can't say I understand it though.... I've got a query running through
psql that's been chugging away for nearly 2 hours now:

auth=# select count(patternid) from patterns where patternid in (select
o_patternid from
auth(# letters, pattern_occurrences where letters.letterid =
pattern_occurrences.o_letterid
auth(# and letters.datecat in (1,2));

patterns has approx. 3,000,000 records, pattern_occurrences 5,000,000,
letters 10,000, of which 8,000 or so are datecat 1 or 2.

Last time I tried to vacuum the database it was still hung 12 hours later
so I cancelled. Haven't tried vacuum analyze or explain but will do so.

Yow. There are two possibilities:

1) VACUUM actually ran that long (possible)
2) You had something else holding a transaction open, which prevents
VACUUM from running. Do you have any clients running that hold
connections open?

You *really* need to VACUUM ANALYZE, especially if your tables have
been active with updates and deletes. Once that's done, do an EXPLAIN
on your long-running queries, post the output along with your schema
and maybe we can help you speed things up.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#7Andrew Perrin
andrew_perrin@unc.edu
In reply to: Doug McNaught (#6)
Re: Performance tips

On 9 Jan 2002, Doug McNaught wrote:

Andrew Perrin <andrew_perrin@unc.edu> writes:

Well, here's the output from vmstat:

aperrin@hm269-26876:~/afshome/papers/authoritarian$ vmstat
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3052 2132 10460 413284 0 0 11 14 6 5 6
3 17

Sorry, I should have told you to do 'vmstat 5' which will keep
printing lines of numbers (every 5 seconds) until you interrupt it.
One line isn't too useful. But hold off on that for now, see below...

After the query I asked about had run for about 3 hours, I cancelled it
and figured I'll try again later. Here's the output from vmstat 5 during
it:

aperrin@hm269-26876:~$ vmstat 5
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 1 0 3288 2860 10460 412896 0 0 2 14 7 7 6
3 17
3 1 0 3288 2668 10460 413088 0 0 2823 7 512 1135 27
5 68
4 0 0 3288 2780 10460 412988 0 0 6078 5 587 1294 58
6 35
4 0 0 3288 2552 10460 413212 0 0 6034 0 577 1294 60
8 32
3 1 0 3288 2712 10460 413044 0 0 5256 0 571 1247 48
9 43
3 0 0 3288 2076 10460 413676 0 0 5366 0 571 1265 51
7 42
4 0 0 3288 2164 10460 413596 0 0 2671 0 509 1113 22
6 72
2 1 0 3288 2456 10460 413300 0 0 6187 0 584 1309 65
7 29
3 0 0 3288 2544 10460 413216 0 0 6037 2 577 1290 61
7 32
4 0 0 3288 2324 10460 413436 0 0 5170 1 555 1233 53
7 41
4 0 0 3288 3184 10460 412576 0 0 5532 0 586 1278 52
7 41
5 0 0 3288 3192 10460 412568 0 0 2603 0 514 1115 25
3 72
4 0 0 3288 2740 10460 413024 0 0 6212 0 591 1318 57
8 35
3 1 0 3288 2648 10460 413116 0 0 6128 0 597 1301 57
7 36
2 1 0 3288 2076 10460 413676 0 0 5211 1 565 1232 55
6 40
3 1 0 3288 2300 10460 413452 0 0 5458 0 587 1270 52
7 42
3 0 0 3288 3036 10460 412724 0 0 2645 0 495 1107 25
4 71

1) VACUUM actually ran that long (possible)
2) You had something else holding a transaction open, which prevents
VACUUM from running. Do you have any clients running that hold
connections open?

I don't think so. But just to make sure, I killed off postmaster and
restarted it. (I'm the only user of this database, so unless I
inadvertently left something running, there shouldn't be stray
connections.)

You *really* need to VACUUM ANALYZE, especially if your tables have
been active with updates and deletes. Once that's done, do an EXPLAIN
on your long-running queries, post the output along with your schema
and maybe we can help you speed things up.

I started a vacuum analyze; it's been about 20 minutes now, and nothing's
happened. I'm going to let it run overnight and see if there's something
there when I wake up. Here's vmstat while the vacuum analyze is running:

procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
5 0 0 3164 2716 10460 421736 0 0 1680 3 444 1030 2
9 89
4 0 0 3164 2876 10460 421580 0 0 1682 0 417 1003 2
6 92
3 0 0 3164 2244 10460 422220 0 0 1677 0 405 1000 1
5 94
4 0 0 3164 2564 10460 421888 0 0 1699 2 427 1012 2
6 92
4 0 0 3164 2116 10460 422336 0 0 1731 0 434 1021 2
7 91
4 1 0 3164 2868 10460 421584 0 0 1735 7 426 1017 0
7 93
4 0 0 3164 3108 10460 421356 0 0 1685 0 418 1005 2
5 93
1 0 0 3164 2848 10460 421608 0 0 1683 0 420 1027 1
5 94
3 0 0 3164 2864 10460 421588 0 0 1689 0 407 1003 1
5 94
3 0 0 3164 2760 10460 421692 0 0 1733 0 424 1024 2
6 92
4 0 0 3164 2712 10460 421744 0 0 1743 1 422 1017 2
7 91
4 0 0 3164 2172 10460 422280 0 0 1800 0 426 1032 3
7 90
4 0 0 3164 2644 10460 421812 0 0 1769 0 422 1022 1
8 91
3 2 0 3164 2276 10460 422176 0 0 1637 0 400 976 2
5 94
4 0 0 3164 3036 10460 421420 0 0 1754 0 418 1032 1
6 92
4 0 0 3164 2552 10460 421904 0 0 1785 2 433 1013 2
6 92
3 1 0 3164 3088 10460 421364 0 0 1773 1 416 995 1
6 93
4 0 0 3164 2080 10460 422116 23 0 1703 0 423 1005 2
4 93
4 0 0 3164 2780 10460 421412 0 0 1733 0 426 998 2
5 93
4 0 0 3164 2800 10460 421396 0 0 1767 2 427 1002 2
6 92
4 0 0 3164 2476 10460 421716 0 0 2008 0 444 1044 2
5 93

Here's what top shows (for memory, swap, etc.):
00:09:01 up 26 days, 15:09, 6 users, load average: 1.07, 1.04, 1.01
74 processes: 70 sleeping, 4 running, 0 zombie, 0 stopped
CPU states: 81.7% user, 8.0% system, 0.0% nice, 10.3% idle
Mem: 516516K total, 513888K used, 2628K free, 10460K buffers
Swap: 498004K total, 3164K used, 494840K free, 421916K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
26890 postgres 16 0 4580 4580 3396 R 88.4 0.8 4:25 postmaster

Thanks-
Andy

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

#8Andrew Perrin
andrew_perrin@unc.edu
In reply to: Doug McNaught (#6)
Re: Performance tips

Never mind - while I was writing the last message the vacuum analyze
ended. No messages from vacuum analyze (just the VACUUM
acknowledgement). I'm recreating the one user-created index (the rest are
based on serials) now and will re-test queries.

Thanks for everyone's help.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: Martijn van Oosterhout (#5)
Re: Performance tips

On Wed, Jan 09, 2002 at 11:33:24PM -0600, Coax wrote:

In Linux, vmstat's first line of output is always incorrect, as it pulls
values from /proc/stat - and averages them over the timeperiod you
specify.

LOL! If that's true, why does running it with no arguments produce one line
of output if it's going to be wrong? I'd count that as a form of bug.

Actually, the man page does discuss this. It says that the first time it
gives an average since boot, which is nice to know, but not very useful in
this case.

Therefore, just running a single vmstat will do zilch for you.

'vmstat 1' will give you accurate numbers on the 2nd thru <infinite>
reading, at 1 second intervals.

Well, I did mention that to him. Hopefully he'll send the results soon.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

Terrorists can only take my life. Only my government can take my freedom.

#10Coax
coax@shell1.cornernet.com
In reply to: Martijn van Oosterhout (#9)
Re: Performance tips

values from /proc/stat - and averages them over the timeperiod you
specify.

LOL! If that's true, why does running it with no arguments produce one line
of output if it's going to be wrong? I'd count that as a form of bug.

Heh. Have fun reporting it. Been that way since /proc/stat was invented -
and psutils was written :)

Either way, thats not a postgres problem - so off topic.

have fun with vmstat. Watch the numbers when your db gets loaded. Might
well tell you exactly where your bottleneck is.

Chad

#11Andy Samuel
andysamuel@geocities.com
In reply to: Andrew Perrin (#8)
Re: Performance tips

I think you *should* recreating the indexes and *then* VACUUM ANALYZE.

Regards
Andy
----- Original Message -----
From: "Andrew Perrin" <andrew_perrin@unc.edu>
To: "Doug McNaught" <doug@wireboard.com>
Cc: <pgsql-general@postgresql.org>
Sent: Thursday, January 10, 2002 12:14 PM
Subject: Re: [GENERAL] Performance tips

Show quoted text

Never mind - while I was writing the last message the vacuum analyze
ended. No messages from vacuum analyze (just the VACUUM
acknowledgement). I'm recreating the one user-created index (the rest are
based on serials) now and will re-test queries.

Thanks for everyone's help.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

In reply to: Andrew Perrin (#7)
Re: Performance tips

Hello All,
I had been following the discussion on improving the
performance of the database. I was wondering what the buffer field in the
vmstat output refers to.

Thanks all
Shan.

#13Alaric B. Snell
abs@frontwire.com
In reply to: Martijn van Oosterhout (#9)
Re: Performance tips

On Thu, 10 Jan 2002, Martijn van Oosterhout wrote:

LOL! If that's true, why does running it with no arguments produce one line
of output if it's going to be wrong? I'd count that as a form of bug.

Backwards compatability :-)

ABS

--
Alaric B. Snell, Developer
abs@frontwire.com

#14Andrew Perrin
andrew_perrin@unc.edu
In reply to: Alaric B. Snell (#13)
Re: Performance tips

shared_buffers is set to 128 in postgresql.conf - is that what you
mean? There's enough RAM in this machine that I should be able to increase
it if that will help.

The query I was running last night didn't finish by morning (started at
midnight, I cancelled it at 8:30 am), so something's clearly wrong. Here's
my plan (critiques and suggestions welcome):

1 Create some more indices, specifically on the id fields used to join
tables
2 Increase shared_buffers as much as is practical
3 Try to rewrite the query without using in(select...) constructs
4 Move my IDE Zip drive to the secondary IDE controller to avoid IDE
slowdowns

If the above aren't satisfactory:
- Buy more RAM and repeat 2 above
- Buy a SCSI hard drive (there's already a good SCSI controller) and move
the database there

HOWEVER... what I'm hearing from most folks is that, even under these
conditions, the performance I'm experiencing is worse than
expected. If that's true, what should I do to diagnose that?

Thanks again.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

On Thu, 10 Jan 2002, Justin Clift wrote:

Show quoted text

Hi Andrew,

What're your memory buffers set to?

The only time I've had my CPU get anywhere near 80% is when I'm running
on really low or default memory buffer settings. Normally, postmaster
doesn't go above 15% for me.

???

Regards and best wishes,

Justin Clift

Andrew Perrin wrote:

Never mind - while I was writing the last message the vacuum analyze
ended. No messages from vacuum analyze (just the VACUUM
acknowledgement). I'm recreating the one user-created index (the rest are
based on serials) now and will re-test queries.

Thanks for everyone's help.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi

#15Jeff Eckermann
jeff_eckermann@yahoo.com
In reply to: Andrew Perrin (#8)
Re: Performance tips

One way to improve performance with queries like yours
is to select the subquery results into a temporary
table, and join against that (perhaps with an index
created on the temp table, if you get a lot of rows).
Ugly and messy, but can result in massive performance
improvements on a system with limited resources, even
compared with the "EXISTS" solution.

--- Andrew Perrin <andrew_perrin@unc.edu> wrote:

Never mind - while I was writing the last message
the vacuum analyze
ended. No messages from vacuum analyze (just the
VACUUM
acknowledgement). I'm recreating the one
user-created index (the rest are
based on serials) now and will re-test queries.

Thanks for everyone's help.

----------------------------------------------------------------------

Andrew J Perrin - andrew_perrin@unc.edu -
http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North
Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC
27599-3210 USA

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

#16Doug McNaught
doug@wireboard.com
In reply to: Andrew Perrin (#14)
Re: Performance tips

Andrew Perrin <andrew_perrin@unc.edu> writes:

shared_buffers is set to 128 in postgresql.conf - is that what you
mean? There's enough RAM in this machine that I should be able to increase
it if that will help.

Oh yes. You should be using 2000 or 3000 at least. 128 is the
absurdly low default (kept that way due to low default shared memory
settings on some systems).

1 Create some more indices, specifically on the id fields used to join
tables
2 Increase shared_buffers as much as is practical
3 Try to rewrite the query without using in(select...) constructs
4 Move my IDE Zip drive to the secondary IDE controller to avoid IDE
slowdowns

2, 3, and 4 will probably help. 1 may actually be
counterproductive--you need to figure out what indexes you actually
*need*. The way to do this is to VACUUM ANALYZE and then use EXPLAIN
on your queries.

If the above aren't satisfactory:
- Buy more RAM and repeat 2 above
- Buy a SCSI hard drive (there's already a good SCSI controller) and move
the database there

These certainly won't hurt, but...

HOWEVER... what I'm hearing from most folks is that, even under these
conditions, the performance I'm experiencing is worse than
expected. If that's true, what should I do to diagnose that?

Use EXPLAIN. Post your schemas and the query plan that EXPLAIN gives
you, and someone here can probably make some suggestions.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#17Peter Darley
pdarley@kinesis-cem.com
In reply to: Jeff Eckermann (#15)
Re: Performance tips

Andrew,
You can get away with not even using the temp table just by specifying your
subquerry as a table in your select and joining it like any other table.

IE: select T1.MyField from Table1 T1, (Select T2.MyField2 FROM Table2 T2,
Table3 T3 WHERE T2.MyField3=T3.MyField3 AND T2.MyField4='Thing' AND
T3.MyField5=Number) T2 WHERE T1.MyField2=T2.MyField2;

I've found that this is super fast; I was able to bring a query which took
~30 seconds down to well under a second using this technique.

Thanks,
Peter Darley
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Jeff Eckermann
Sent: Thursday, January 10, 2002 7:04 AM
To: andrew_perrin@unc.edu; Doug McNaught
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance tips

One way to improve performance with queries like yours
is to select the subquery results into a temporary
table, and join against that (perhaps with an index
created on the temp table, if you get a lot of rows).
Ugly and messy, but can result in massive performance
improvements on a system with limited resources, even
compared with the "EXISTS" solution.

--- Andrew Perrin <andrew_perrin@unc.edu> wrote:

Never mind - while I was writing the last message
the vacuum analyze
ended. No messages from vacuum analyze (just the
VACUUM
acknowledgement). I'm recreating the one
user-created index (the rest are
based on serials) now and will re-test queries.

Thanks for everyone's help.

----------------------------------------------------------------------

Andrew J Perrin - andrew_perrin@unc.edu -
http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North
Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC
27599-3210 USA

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#18Doug McNaught
doug@wireboard.com
In reply to: Shanmugasundaram Doraisamy (#12)
Re: Performance tips

<shan@ceedees.com> writes:

Hello All,
I had been following the discussion on improving the
performance of the database. I was wondering what the buffer field in the
vmstat output refers to.

It's equivalent to the 'buffers' column in the output from 'free'
(under Linux at least). It refers to part of the memory used for
caching disk data.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Perrin (#14)
Re: Performance tips

Andrew Perrin <andrew_perrin@unc.edu> writes:

shared_buffers is set to 128 in postgresql.conf - is that what you
mean?

That's awfully small. On any modern machine you should be setting it in
the low thousands. (Yeah, I know the default is only 64. We've had
discussions about changing that ...) Note you must restart the
postmaster to get this change to take effect.

HOWEVER... what I'm hearing from most folks is that, even under these
conditions, the performance I'm experiencing is worse than
expected. If that's true, what should I do to diagnose that?

Have you shown us the specific query and the EXPLAIN output for it?

regards, tom lane

#20Andrew Perrin
andrew_perrin@unc.edu
In reply to: Tom Lane (#19)
Re: Performance tips

On Thu, 10 Jan 2002, Tom Lane wrote:

Andrew Perrin <andrew_perrin@unc.edu> writes:

shared_buffers is set to 128 in postgresql.conf - is that what you
mean?

That's awfully small. On any modern machine you should be setting it in
the low thousands. (Yeah, I know the default is only 64. We've had
discussions about changing that ...) Note you must restart the
postmaster to get this change to take effect.

Done - I've changed it to 3000 which is about as high as I can go without
recompiling the kernel (which I'll probably do in the future but don't
have time now).

HOWEVER... what I'm hearing from most folks is that, even under these
conditions, the performance I'm experiencing is worse than
expected. If that's true, what should I do to diagnose that?

Have you shown us the specific query and the EXPLAIN output for it?

regards, tom lane

auth=# EXPLAIN select count(patternid) from patterns where patternid in
(select
auth(# o_patternid from letters, pattern_occurrences where
letters.letterid =
auth(# pattern_occurrences.o_letterid and letters.datecat in (1,2));
NOTICE: QUERY PLAN:

Aggregate (cost=10770432787318.88..10770432787318.88 rows=1 width=4)
-> Seq Scan on patterns (cost=0.00..10770432756138.14 rows=12472297
width=4)
SubPlan
-> Materialize (cost=863548.43..863548.43 rows=5749731
width=12)
-> Hash Join (cost=1741.00..863548.43 rows=5749731
width=12)
-> Seq Scan on pattern_occurrences
(cost=0.00..250248.56 rows=15287556 width=8)
-> Hash (cost=1729.67..1729.67 rows=4530 width=4)
-> Seq Scan on letters (cost=0.00..1729.67
rows=4530 width=4)

EXPLAIN

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin@unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Perrin (#20)
#22Andrew Perrin
andrew_perrin@unc.edu
In reply to: Tom Lane (#21)