about new join syntax performance
Hello,
we are working with v7.0.2 but to upgrade to the recent version of postgres regarding to the full join syntax most demanded by our web-application.
I'd like to get some tips about how to rewrite our queries to achieve better performance.
[ from the manual 2.1.1.2. Subqueries in FROM clause ]
" More interesting cases, which can't be reduced to a plain join, arise when the subquery involves grouping or aggregation."
Take a look at this query:
SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN document AS D USING(did);
My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq):
SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...;
SELECT * FROM tmp AS V, document AS D WHER V.did = D.did;
additional informations:
table document:
did serial primary key,
content text not null
total rows = 22.000 -
table values :
did serial references (document),
value text not null,
/* other columns */
total rows = 2 or 5 times of document rows, (there are 2 up to 5 references to each document)
I'd like to know whether it's worth changing the scripts to use the new join syntax.
Papp Gyozo
- pgerzson@freestart.hu
"Gyozo Papp" <pgerzson@freestart.hu> writes:
Take a look at this query:
SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN document AS D USING(did);
My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq):
SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...;
SELECT * FROM tmp AS V, document AS D WHER V.did = D.did;
My guess is that you'd get exactly the same component plans, so the
savings would just be the overhead involved in creating, filling, and
eventually deleting a temp table. Hard to tell whether that's
significant or not in the context of this sort of query. (Me, I'd go for
it just so I didn't have to worry about remembering to delete the temp
table before I could issue another similar query.)
You might get a different plan for the join step because it would be
determined by the estimated number of rows output from the subselect,
whereas in the temp-table case you'd just get a default estimate for
the size of the temp table. (I'm assuming you don't stop to VACUUM
ANALYZE the temp table...) Depending on how good the estimate is,
this could be a better or worse plan.
I'd like to know whether it's worth changing the scripts to use the
new join syntax.
Try a few and see.
regards, tom lane