OUTER JOIN IS SLOW
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.
getDateRange"('12/1/2005','12/1/2006') <- simply generates a date
list. Doesn't even access a table
SELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';
But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;
Do I need to increase the work_mem or is this possible still a ANALYZE
issue?
Benjamin
Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
Show quoted text
On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.getDateRange"('12/1/2005','12/1/2006') <- simply generates a date
list. Doesn't even access a tableSELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;Do I need to increase the work_mem or is this possible still a ANALYZE
issue?Benjamin
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
....adding to the last email, for now try the work_mem but you should be
adding ANALYZE along with the VACUUM (with a cron job I guess) you do
regularly.
------------
Shoaib Mir
EntperpriseDB (www.enterprisedb.com)
Show quoted text
On 12/24/06, Shoaib Mir <shoaibmir@gmail.com> wrote:
Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
I have been running pieces of my PL function by hand and I have found
that the following queries work by themselves taking less than a second
to execute.getDateRange"('12/1/2005','12/1/2006') <- simply generates a date
list. Doesn't even access a tableSELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN (SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;Do I need to increase the work_mem or is this possible still a ANALYZE
issue?Benjamin
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I got
Merge Full Join (cost=62.33..73.36 rows=1000 width=19) (actual
time=39.205..8521.644 rows=272 loops=1)
Merge Cond: ("outer".pkey = "inner".d1)
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.604..5.674 rows=262 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1)
Total runtime: 8522.894 ms
(9 rows)
It looks like this line:
"-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)"
is the problem. As I understand it is using the index but it is low as
dirt. Hopefully, the -z will fix this. I also ran the same query but
with earlier dates in the table and the query ran much faster.
Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual
time=7.605..12.851 rows=273 loops=1)
Merge Cond: ("outer".d1 = "inner".pkey)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.586..5.530 rows=263 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1)
-> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
Sort Key: mutualfd_weekday_qbid.pkey
-> Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
Total runtime: 13.935 ms
(10 rows)
In this case it only took 13.935ms as compared to 8522.894 ms for the
newer data.
Benjamin
Shoaib Mir wrote:
Show quoted text
....adding to the last email, for now try the work_mem but you should
be adding ANALYZE along with the VACUUM (with a cron job I guess) you
do regularly.------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
<mailto:shoaibmir@gmail.com>> wrote:Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
<mailto:benjamin@araisoft.com>> wrote:I have been running pieces of my PL function by hand and I
have found
that the following queries work by themselves taking less than
a second
to execute.getDateRange"('12/1/2005','12/1/2006') <- simply generates a
date
list. Doesn't even access a tableSELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
(SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;Do I need to increase the work_mem or is this possible still a
ANALYZE
issue?Benjamin
---------------------------(end of
broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating athttp://www.postgresql.org/about/donate
<http://www.postgresql.org/about/donate>
Yes, ANALYZE should definitely improve the performance for query...
--------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
Show quoted text
On 12/24/06, Benjamin Arai <benjamin@araisoft.com> wrote:
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I gotMerge Full Join (cost=62.33..73.36 rows=1000 width=19) (actual
time=39.205..8521.644 rows=272 loops=1)
Merge Cond: ("outer".pkey = "inner".d1)
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.604..5.674 rows=262 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.591..3.591 rows=262 loops=1)
Total runtime: 8522.894 ms
(9 rows)It looks like this line:
"-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)"is the problem. As I understand it is using the index but it is low as
dirt. Hopefully, the -z will fix this. I also ran the same query but
with earlier dates in the table and the query ran much faster.Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual
time=7.605..12.851 rows=273 loops=1)
Merge Cond: ("outer".d1 = "inner".pkey)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=4.586..5.530 rows=263 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50
rows=1000 width=4) (actual time=2.493..3.515 rows=263 loops=1)
-> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
Sort Key: mutualfd_weekday_qbid.pkey
-> Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
Total runtime: 13.935 ms
(10 rows)In this case it only took 13.935ms as compared to 8522.894 ms for the
newer data.Benjamin
Shoaib Mir wrote:
....adding to the last email, for now try the work_mem but you should
be adding ANALYZE along with the VACUUM (with a cron job I guess) you
do regularly.------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
<mailto:shoaibmir@gmail.com>> wrote:Try increasing the work_mem first to see the change, that might
help.
-------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
<mailto:benjamin@araisoft.com>> wrote:I have been running pieces of my PL function by hand and I
have found
that the following queries work by themselves taking less than
a second
to execute.getDateRange"('12/1/2005','12/1/2006') <- simply generates a
date
list. Doesn't even access a tableSELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
(SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;Do I need to increase the work_mem or is this possible still a
ANALYZE
issue?Benjamin
---------------------------(end of
broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donatingat
http://www.postgresql.org/about/donate
<http://www.postgresql.org/about/donate>
On Sat, 23 Dec 2006, Benjamin Arai wrote:
"-> Index Scan using mutualfd_weekday_qbid_pkey_idx on mutualfd_weekday_qbid
(cost=0.00..6.01 rows=1 width=19) (actual time=34.579..8510.801 rows=253
loops=1)"
You're right that this is the problem and show that the planner was expecting
a very low cost on the index scan, but it turned out to be much higher. This
is because of old statistics. VACUUM ANALYZE should remedy this.
Just run:
ANALYZE mytablename;
in psql and see if that gets you going.
Also of note, you can set work_mem per session, so it's reasonable to
benchmark various values until you find one that helps your query.
Note that you might not want to set work_mem too high, because it's per
operation, so you could easily run your machine out of RAM if you set this too
high and have many concurrent queries running.
Also, to answer your question regarding FSM settings, you should run a
"vacuumdb -av" and look at the last 8 lines to see if you have your FSM
settings high enough.
is the problem. As I understand it is using the index but it is low as dirt.
Hopefully, the -z will fix this. I also ran the same query but with earlier
dates in the table and the query ran much faster.Merge Full Join (cost=3492.48..3505.60 rows=1034 width=19) (actual
time=7.605..12.851 rows=273 loops=1)
Merge Cond: ("outer".d1 = "inner".pkey)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=4.586..5.530
rows=263 loops=1)
Sort Key: d1.d1
-> Function Scan on "getDateRange" d1 (cost=0.00..12.50 rows=1000
width=4) (actual time=2.493..3.515 rows=263 loops=1)
-> Sort (cost=3430.15..3432.74 rows=1034 width=19) (actual
time=2.998..3.971 rows=263 loops=1)
Sort Key: mutualfd_weekday_qbid.pkey
-> Index Scan using mutualfd_weekday_qbid_pkey on
mutualfd_weekday_qbid (cost=0.00..3378.38 rows=1034 width=19) (actual
time=0.075..1.843 rows=263 loops=1)
Index Cond: ((cusip = '92193920'::text) AND (pkey >=
'1999-12-15'::date) AND (pkey <= '2000-12-15'::date))
Total runtime: 13.935 ms
(10 rows)In this case it only took 13.935ms as compared to 8522.894 ms for the newer
data.Benjamin
Shoaib Mir wrote:
....adding to the last email, for now try the work_mem but you should be
adding ANALYZE along with the VACUUM (with a cron job I guess) you do
regularly.------------
Shoaib Mir
EntperpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Shoaib Mir* <shoaibmir@gmail.com
<mailto:shoaibmir@gmail.com>> wrote:Try increasing the work_mem first to see the change, that might help.
-------------
Shoaib Mir
EnterpriseDB ( www.enterprisedb.com <http://www.enterprisedb.com>)On 12/24/06, *Benjamin Arai* <benjamin@araisoft.com
<mailto:benjamin@araisoft.com>> wrote:I have been running pieces of my PL function by hand and I
have found
that the following queries work by themselves taking less than
a second
to execute.getDateRange"('12/1/2005','12/1/2006') <- simply generates a
date
list. Doesn't even access a tableSELECT * FROM mutualfd_weekday_qbid WHERE cusip='92193920'
AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006';But when combined as below it takes 10 seconds to execute.
SELECT d1 as date, d2.data as data FROM
"getDateRange"('12/1/2005','12/1/2006') d1 FULL OUTER JOIN
(SELECT *
FROM mutualfd_weekday_qbid WHERE cusip='92193920' AND pkey >=
'12/15/2005' AND pkey <= '12/15/2006') d2 ON d1=d2.pkey;Do I need to increase the work_mem or is this possible still a
ANALYZE
issue?Benjamin
---------------------------(end of
broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating athttp://www.postgresql.org/about/donate
<http://www.postgresql.org/about/donate>---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
Benjamin Arai <benjamin@araisoft.com> writes:
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)
Hm, so how many rows in mutualfd_weekday_qbid for that date?
And how many satisfy the cusip condition? (I suppose 253,
but it looks like that must be a very small fraction of all
the rows for that date.)
The selectivity estimators are not great about dealing with
zero-width intervals like this one (in fact, if you look at the
code you'll find it doesn't even bother to distinguish '>' from '>='
... something we should probably try to improve sometime).
You'd probably have better luck if you could fold the WHERE condition
down to "pkey = '2005-12-15'". Dunno how feasible that is for your
application.
regards, tom lane
Hi,
I did a vacuum with -z and it fixed the issue. I was not aware that
vacuumdb didn't ANALYZE by default. Thanks everybody for all of the help!
Benjamin
Tom Lane wrote:
Show quoted text
Benjamin Arai <benjamin@araisoft.com> writes:
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)Hm, so how many rows in mutualfd_weekday_qbid for that date?
And how many satisfy the cusip condition? (I suppose 253,
but it looks like that must be a very small fraction of all
the rows for that date.)The selectivity estimators are not great about dealing with
zero-width intervals like this one (in fact, if you look at the
code you'll find it doesn't even bother to distinguish '>' from '>='
... something we should probably try to improve sometime).You'd probably have better luck if you could fold the WHERE condition
down to "pkey = '2005-12-15'". Dunno how feasible that is for your
application.regards, tom lane
Benjamin Arai wrote:
Just to make things more clear I ran EXPLAIN ANALYZE on the slow query.
I got
-> Index Scan using mutualfd_weekday_qbid_pkey_idx on
mutualfd_weekday_qbid (cost=0.00..6.01 rows=1 width=19) (actual
time=34.579..8510.801 rows=253 loops=1)
Index Cond: ((pkey >= '2005-12-15'::date) AND (pkey <=
'2006-12-15'::date))
Filter: (cusip = '92193920'::text)
This is the relevant part (including the index details). It's filtering
(text) over all index items within a period of a full year. I imagine
can be quite a lot of data (unless pkey is a primary key on a date
field; only 366 records).
As mentioned, ANALYZE helps, but you may want to check:
1) Are you comparing integers as text strings? You may not want to.
2) Do you have an index on cusip?
You probably have, just in case.
--
Alban Hertroys
alban@magproductions.nl
magproductions b.v.
T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede
// Integrate Your World //