about new join syntax performance

Started by Gyozo Pappalmost 25 years ago2 messagesgeneral
Jump to latest
#1Gyozo Papp
pgerzson@freestart.hu

Hello,

we are working with v7.0.2 but to upgrade to the recent version of postgres regarding to the full join syntax most demanded by our web-application.
I'd like to get some tips about how to rewrite our queries to achieve better performance.

[ from the manual 2.1.1.2. Subqueries in FROM clause ]
" More interesting cases, which can't be reduced to a plain join, arise when the subquery involves grouping or aggregation."

Take a look at this query:
SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN document AS D USING(did);

My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq):
SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...;
SELECT * FROM tmp AS V, document AS D WHER V.did = D.did;

additional informations:
table document:
did serial primary key,
content text not null
total rows = 22.000 -

table values :
did serial references (document),
value text not null,
/* other columns */
total rows = 2 or 5 times of document rows, (there are 2 up to 5 references to each document)

I'd like to know whether it's worth changing the scripts to use the new join syntax.

Papp Gyozo
- pgerzson@freestart.hu

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gyozo Papp (#1)
Re: about new join syntax performance

"Gyozo Papp" <pgerzson@freestart.hu> writes:

Take a look at this query:
SELECT * FROM (SELECT did, count(*) FROM values WHERE ... GROUP BY did HAVING count(*) > ...) AS V(did, ok) JOIN document AS D USING(did);

My question is this significantly faster than two separate queries which are used now (in v7.0.2) via PHP (libpq):
SELECT did, count(*) INTO TEMPORARY TABLE tmp FROM values WHERE ... GROUP BY did HAVING count(*) > ...;
SELECT * FROM tmp AS V, document AS D WHER V.did = D.did;

My guess is that you'd get exactly the same component plans, so the
savings would just be the overhead involved in creating, filling, and
eventually deleting a temp table. Hard to tell whether that's
significant or not in the context of this sort of query. (Me, I'd go for
it just so I didn't have to worry about remembering to delete the temp
table before I could issue another similar query.)

You might get a different plan for the join step because it would be
determined by the estimated number of rows output from the subselect,
whereas in the temp-table case you'd just get a default estimate for
the size of the temp table. (I'm assuming you don't stop to VACUUM
ANALYZE the temp table...) Depending on how good the estimate is,
this could be a better or worse plan.

I'd like to know whether it's worth changing the scripts to use the
new join syntax.

Try a few and see.

regards, tom lane