Really slow query on 6.4.2
Not sure if I should post this here, but it seemed kinda appropriate.
Anyway, I'm using 6.4.2 and execute the following query in psql, piping the
results to a file:
"select autos.*, owners.name, owners.email, owners.dphone, owners.ephone,
owners.zip, owners.country from autos, owners where autos.ownerid =
owners.id;"
This takes about 60 seconds at 0% idle CPU, with the backend taking all the
time. The file ends up about 3MB. Both tables have between 1200 and 1600
rows with about 25 and 7 columns respectively.
A simpler query like:
"select * from autos;" takes about a second at about 50% idle, and produces
a similiar amount of data in a 3MB file.
Any hints on speeding this up?
OS: Redhat Linux 5.1, Dual-PPro 266.
The table definitions are below if anyone is interested:
(Also, the cdate default value doesn't get set properly to the current date.
Any hints on that would
be appreciated as well.)
Thanks,
Rich.
Table = owners
+----------------------------------+----------------------------------+-----
--+
| Field | Type |
Length|
+----------------------------------+----------------------------------+-----
--+
| id | float8 |
8 |
| name | varchar() |
0 |
| email | varchar() |
0 |
| dphone | varchar() |
0 |
| ephone | varchar() |
0 |
| zip | varchar() |
0 |
| country | varchar() |
0 |
| password | varchar() |
0 |
| isdealer | bool |
1 |
| cdate | date default datetime 'now' |
4 |
+----------------------------------+----------------------------------+-----
--+
Table = autos
+----------------------------------+----------------------------------+-----
--+
| Field | Type |
Length|
+----------------------------------+----------------------------------+-----
--+
| id | float8 |
8 |
| ownerid | float8 |
8 |
| city | varchar() |
0 |
| region | varchar() |
0 |
| year | varchar() |
0 |
| mileage | int8 |
8 |
| make | varchar() |
0 |
| model | varchar() |
0 |
| price | money |
4 |
| bo | bool |
1 |
| ecolor | varchar() |
0 |
| icolor | varchar() |
0 |
| condition | varchar() |
0 |
| trans | varchar() |
0 |
| drivetrain | varchar() |
0 |
| cylinders | varchar() |
0 |
| power_steering | varchar() |
0 |
| power_windows | varchar() |
0 |
| power_locks | varchar() |
0 |
| pwr_driver_seat | varchar() |
0 |
| abs | varchar() |
0 |
| driver_air_bag | varchar() |
0 |
| dual_air_bag | varchar() |
0 |
| leather | varchar() |
0 |
| air | varchar() |
0 |
| radio | varchar() |
0 |
| cassette | varchar() |
0 |
| cd | varchar() |
0 |
| extra_cab | varchar() |
0 |
| tow_pkg | varchar() |
0 |
| sun_roof | varchar() |
0 |
| roof_rack | varchar() |
0 |
| description | varchar() |
0 |
| cdate | date default datetime 'now' |
4 |
+----------------------------------+----------------------------------+-----
--+
You don't mention any indexes. Make sure you have indexes in stalled in
autos.ownerid and owners.id.
-----Original Message-----
From: Postgres mailing lists [SMTP:postgres@weblynk.com]
Sent: Wednesday, March 24, 1999 12:37 AM
To: hackers@postgreSQL.org
Subject: [HACKERS] Really slow query on 6.4.2
Not sure if I should post this here, but it seemed kinda
appropriate.
Anyway, I'm using 6.4.2 and execute the following query in psql,
piping the
results to a file:
"select autos.*, owners.name, owners.email, owners.dphone,
owners.ephone,
owners.zip, owners.country from autos, owners where autos.ownerid =
owners.id;"
This takes about 60 seconds at 0% idle CPU, with the backend taking
all the
time. The file ends up about 3MB. Both tables have between 1200 and
1600
rows with about 25 and 7 columns respectively.
A simpler query like:
"select * from autos;" takes about a second at about 50% idle, and
produces
a similiar amount of data in a 3MB file.
Any hints on speeding this up?
OS: Redhat Linux 5.1, Dual-PPro 266.
The table definitions are below if anyone is interested:
(Also, the cdate default value doesn't get set properly to the
current date.
Any hints on that would
be appreciated as well.)
Thanks,
Rich.
Table = owners
+----------------------------------+----------------------------------+-----
--+
| Field | Type
|
Length|
+----------------------------------+----------------------------------+-----
--+
| id | float8
|
8 |
| name | varchar()
|
0 |
| email | varchar()
|
0 |
| dphone | varchar()
|
0 |
| ephone | varchar()
|
0 |
| zip | varchar()
|
0 |
| country | varchar()
|
0 |
| password | varchar()
|
0 |
| isdealer | bool
|
1 |
| cdate | date default datetime 'now'
|
4 |
+----------------------------------+----------------------------------+-----
--+
Table = autos
+----------------------------------+----------------------------------+-----
--+
| Field | Type
|
Length|
+----------------------------------+----------------------------------+-----
--+
| id | float8
|
8 |
| ownerid | float8
|
8 |
| city | varchar()
|
0 |
| region | varchar()
|
0 |
| year | varchar()
|
0 |
| mileage | int8
|
8 |
| make | varchar()
|
0 |
| model | varchar()
|
0 |
| price | money
|
4 |
| bo | bool
|
1 |
| ecolor | varchar()
|
0 |
| icolor | varchar()
|
0 |
| condition | varchar()
|
0 |
| trans | varchar()
|
0 |
| drivetrain | varchar()
|
0 |
| cylinders | varchar()
|
0 |
| power_steering | varchar()
|
0 |
| power_windows | varchar()
|
0 |
| power_locks | varchar()
|
0 |
| pwr_driver_seat | varchar()
|
0 |
| abs | varchar()
|
0 |
| driver_air_bag | varchar()
|
0 |
| dual_air_bag | varchar()
|
0 |
| leather | varchar()
|
0 |
| air | varchar()
|
0 |
| radio | varchar()
|
0 |
| cassette | varchar()
|
0 |
| cd | varchar()
|
0 |
| extra_cab | varchar()
|
0 |
| tow_pkg | varchar()
|
0 |
| sun_roof | varchar()
|
0 |
| roof_rack | varchar()
|
0 |
| description | varchar()
|
0 |
| cdate | date default datetime 'now'
|
4 |
+----------------------------------+----------------------------------+-----
--+
Import Notes
Resolved by subject fallback
"Postgres mailing lists" <postgres@weblynk.com> writes:
Anyway, I'm using 6.4.2 and execute the following query in psql, piping the
results to a file:
"select autos.*, owners.name, owners.email, owners.dphone, owners.ephone,
owners.zip, owners.country from autos, owners where autos.ownerid =
owners.id;"
This takes about 60 seconds at 0% idle CPU, with the backend taking all the
time. The file ends up about 3MB. Both tables have between 1200 and 1600
rows with about 25 and 7 columns respectively.
Have you done a "vacuum analyze" lately? Sounds like the thing is using
a nested loop query plan, which is appropriate for tiny tables but not
for large ones. You could check this by seeing what EXPLAIN says.
Unfortunately, if you haven't done a vacuum, the system effectively
assumes that all your tables are tiny. I think this is a brain-dead
default, but haven't had much luck convincing anyone else that the
default should be changed.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue23Mar1999233727-0800001201be75c92c1a9b1064ba01cc@earthlink.net | Resolved by subject fallback
The vacuum analyze did it. It's fast now. Thanks a bunch.
rich.
On Wed, 24 Mar 1999, Tom Lane wrote:
Show quoted text
"Postgres mailing lists" <postgres@weblynk.com> writes:
Anyway, I'm using 6.4.2 and execute the following query in psql, piping the
results to a file:
"select autos.*, owners.name, owners.email, owners.dphone, owners.ephone,
owners.zip, owners.country from autos, owners where autos.ownerid =
owners.id;"
This takes about 60 seconds at 0% idle CPU, with the backend taking all the
time. The file ends up about 3MB. Both tables have between 1200 and 1600
rows with about 25 and 7 columns respectively.Have you done a "vacuum analyze" lately? Sounds like the thing is using
a nested loop query plan, which is appropriate for tiny tables but not
for large ones. You could check this by seeing what EXPLAIN says.Unfortunately, if you haven't done a vacuum, the system effectively
assumes that all your tables are tiny. I think this is a brain-dead
default, but haven't had much luck convincing anyone else that the
default should be changed.regards, tom lane
Unfortunately, if you haven't done a vacuum, the system effectively
assumes that all your tables are tiny. I think this is a brain-dead
default, but haven't had much luck convincing anyone else that the
default should be changed.
I totally agree with Tom Lane here. Let me try to give some arguments.
1. If you have a user that does vacuum analyze regularly, we can
convince him to do vacuum analyze right after table creation, if he
knows the table will be tiny.
2. We have an application where the size of 20 tables changes from
0 to ~200000 rows in 3 hours. To have accurate statistics during the day we
would need to analyze at least every 20 min.
This was not acceptable during those 3 hours.
So we took the approach to tune the sql to work properly without ever
doing statistics.
This works perfectly on our Informix installation, since Informix has
a tuning parameter, that tells it, that an index has to be used iff
possible even if cost is higher, and the default for table size is 100.
3. There are two types of popular optimizers, rule and cost based.
A good approach is to behave rule based lacking statistics and cost
based with statistics. An easy way to achieve this is to choose
reasonable defaults for the statistics before accurate statistics
are made.
4. Those doing statistics will most likely not leave out a few tables, thus
creating an undefined state where the optimizer would behave rule
and cost based.
5. Actually postgresql has behaved in this manner because of certain
"bugs" in the optimizer. Recently a lot of those "bugs" have been
identified and "fixed", thus destroying the defacto rule based
behavior.
If the defaults are not changed, behavior of the overall system will
actually be changed for the case where statistics are lacking, when the
optimizer is improved to actually behave cost based under all
circumstances.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
5. Actually postgresql has behaved in this manner because of certain
"bugs" in the optimizer. Recently a lot of those "bugs" have been
identified and "fixed", thus destroying the defacto rule based
behavior.
That's a real good point --- I think we've already heard a couple of
complaints about the new optimizer doing "silly" things that it didn't
use to do.
I repeat my proposal: CREATE TABLE should insert a default size (say
about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
That way, the optimizer will only choose small-table-oriented plans
if the table has actually been verified to be small by vacuum.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu25Mar1999084009+0100219F68D65015D011A8E000006F8590C60267B331@sdexcsrv1.f000.d0188.sd.spardat.at | Resolved by subject fallback
On Thu, 25 Mar 1999, Zeugswetter Andreas IZ5 wrote:
Unfortunately, if you haven't done a vacuum, the system effectively
assumes that all your tables are tiny. I think this is a brain-dead
default, but haven't had much luck convincing anyone else that the
default should be changed.I totally agree with Tom Lane here. Let me try to give some arguments.
Maybe I've missed something here, but I don't think anyone disagree's that
our stats aren't the best, but I also don't think anyone has step'd up and
provided an alternative...have they?
Personally, I'd like to see some method where stats can, to a certain
extent, be updated automagically, when changes are made to the table. The
generated stats wouldn't *replace* vacuum, just reduce the overall need
for them.
I'm not sure what is all contained in the stats, but the easiest one, I
think, to have done automagically is table sizes...add a tuple, update the
table of number of rows automatically. If that numbers gets "off", at
least it will be more reasonable then not doing anything...no?
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
5. Actually postgresql has behaved in this manner because of certain
"bugs" in the optimizer. Recently a lot of those "bugs" have been
identified and "fixed", thus destroying the defacto rule based
behavior.That's a real good point --- I think we've already heard a couple of
complaints about the new optimizer doing "silly" things that it didn't
use to do.I repeat my proposal: CREATE TABLE should insert a default size (say
about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
That way, the optimizer will only choose small-table-oriented plans
if the table has actually been verified to be small by vacuum.
OK. Sounds good to me.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
The Hermit Hacker <scrappy@hub.org> writes:
I'm not sure what is all contained in the stats, but the easiest one, I
think, to have done automagically is table sizes...add a tuple, update the
table of number of rows automatically. If that numbers gets "off", at
least it will be more reasonable then not doing anything...no?
The number of tuples is definitely the most important stat; updating it
automatically would make the optimizer work better. The stuff in
pg_statistics is not nearly as important.
The only objection I can think of to auto-updating reltuples is that
it'd mean additional computation (to access and rewrite the pg_class
entry) and additional disk I/O (to write back pg_class) for every INSERT
and DELETE. There's also a potential problem of multiple backends all
trying to write pg_class and being delayed or even deadlocked because of
it. (Perhaps the MVCC code will help here.)
I'm not convinced that accurate stats are worth that cost, but I don't
know how big the cost would be anyway. Anyone have a feel for it?
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu25Mar1999115801-0400Pine.BSF.4.05.9903251154390.6652-100000@thelab.hub.org | Resolved by subject fallback
Then <tgl@sss.pgh.pa.us> spoke up and said:
I'm not convinced that accurate stats are worth that cost, but I don't
know how big the cost would be anyway. Anyone have a feel for it?
They are definitely *not* worth the cost. Especially since no table
will have the default 0 rows entry after a single vacuum analyze of
that table. Let's be honest: if you aren't interested in doing a
vacuum, then really aren't interested in performance, anyway.
--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@cmu.edu for my public key. |
=====================================================================
On 25 Mar 1999 geek+@cmu.edu wrote:
Then <tgl@sss.pgh.pa.us> spoke up and said:
I'm not convinced that accurate stats are worth that cost, but I don't
know how big the cost would be anyway. Anyone have a feel for it?They are definitely *not* worth the cost. Especially since no table
will have the default 0 rows entry after a single vacuum analyze of
that table. Let's be honest: if you aren't interested in doing a
vacuum, then really aren't interested in performance, anyway.
What I personally am not interested in is having to spend 20 minute per
day with a totally locked up database because I want my queries to be
faster, when there are other ways of doing it...
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Thu, 25 Mar 1999, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
I'm not sure what is all contained in the stats, but the easiest one, I
think, to have done automagically is table sizes...add a tuple, update the
table of number of rows automatically. If that numbers gets "off", at
least it will be more reasonable then not doing anything...no?The number of tuples is definitely the most important stat; updating it
automatically would make the optimizer work better. The stuff in
pg_statistics is not nearly as important.The only objection I can think of to auto-updating reltuples is that
it'd mean additional computation (to access and rewrite the pg_class
entry) and additional disk I/O (to write back pg_class) for every INSERT
and DELETE. There's also a potential problem of multiple backends all
trying to write pg_class and being delayed or even deadlocked because of
it. (Perhaps the MVCC code will help here.)I'm not convinced that accurate stats are worth that cost, but I don't
know how big the cost would be anyway. Anyone have a feel for it?
We're not looking for perfect numbers here, how about something just
stored in cache and periodically written out to disk? We already have the
shard memory pool to work with...
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Thu, 25 Mar 1999, Tom Lane wrote:
Zeugswetter Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
5. Actually postgresql has behaved in this manner because of certain
"bugs" in the optimizer. Recently a lot of those "bugs" have been
identified and "fixed", thus destroying the defacto rule based
behavior.That's a real good point --- I think we've already heard a couple of
complaints about the new optimizer doing "silly" things that it didn't
use to do.I repeat my proposal: CREATE TABLE should insert a default size (say
about 1000 tuples) into pg_class.reltuples, rather than inserting 0.
That way, the optimizer will only choose small-table-oriented plans
if the table has actually been verified to be small by vacuum.
inserting 0 is an accurate number, not 1000 ...
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Then <scrappy@hub.org> spoke up and said:
On 25 Mar 1999 geek+@cmu.edu wrote:
They are definitely *not* worth the cost. Especially since no table
will have the default 0 rows entry after a single vacuum analyze of
that table. Let's be honest: if you aren't interested in doing a
vacuum, then really aren't interested in performance, anyway.What I personally am not interested in is having to spend 20 minute per
day with a totally locked up database because I want my queries to be
faster, when there are other ways of doing it...
Uhm, no. The specific case we are talking about here is creation of a
table, inserting rows into it, and NEVER running vacuum analyze on
it. This would not lock up your database for 20 minutes unless you
are dropping and re-creating a bunch of tables. Even that case could
be scripted creatively[0]Assuming the data are laid out "sequentially" on the index fields: create the table, chop off and insert only the first and last thousand[1] rows, vacuum, and then insert the rest., though. Further, you don't have to run it
on a whole database every night. Just the tables of interest.
We run a multi-gigabyte Ingres database her for our student systems.
When we want to make sure that good plans are chosen, we sysmod and
optimizedb it. Since we always want good plans, but rarely inload
massive amounts of data, we do this once a week.
One of the things to be kept in mind with performance tuning is
tradeoffs. Does it make sense to penalize every transaction for the
sake of updating statistics? (the answer is "maybe") Does it make
sense to penalize every transaction to provide a recovery mechanism?
(yes) Does it make sense to penalize every transaction to prevent any
one transaction from using up more than 1MB/s of bandwidth? (no)
Should you extract the data to a binary flat file, read it in C,
collect the information of interest and then do something interesting
with it? (maybe)
[0]: Assuming the data are laid out "sequentially" on the index fields: create the table, chop off and insert only the first and last thousand[1] rows, vacuum, and then insert the rest.
create the table, chop off and insert only the first and last
thousand[1]Or perhaps a slightly bigger number. Or a sampling of the file rows, vacuum, and then insert the rest.
[1]: Or perhaps a slightly bigger number. Or a sampling of the file
--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@cmu.edu for my public key. |
=====================================================================