how to write an optimized sql with two same subsql?

Started by sunpengover 15 years ago4 messagesgeneral

bluevaley@gmail.com

over 15 years ago

We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;

Does that mean subsql will be executed twice? or how to write the optimized
sql?

robjsargent@gmail.com

over 15 years ago

In reply to: sunpeng (#1)

Re: how to write an optimized sql with two same subsql?

On 10/14/2010 05:34 PM, sunpeng wrote:

We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;

Does that mean subsql will be executed twice? or how to write the
optimized sql?

What are you trying to discover about groups and their frequency in
tablea? Does the numberical value of groupid have any meaning in your
system?

bluevaley@gmail.com

over 15 years ago

In reply to: Rob Sargent (#2)

Re: how to write an optimized sql with two same subsql?

Actually I've simplied my original sql to the previous version, since it's
simple yet reveals the same problem.
My original sql is to get two instersected cluster(as same concept as group
) and its commonarea:

SELECT a.clusterid AS clusterida,
b.clusterid AS clusteridb,
*St_astext*(*St_intersection*(a.bufferbox, b.bufferbox)) AS
commonarea
FROM (SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000)
AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) a,
(SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000)
AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) b
WHERE b.clusterid > a.clusterid
AND *St_intersects*(a.bufferbox, b.bufferbox)
ORDER BY a.clusterid;

The DDL for _mcir_2347694 and _mcir_2347694_clusterid2 is:
CREATE TABLE _mcir_2347579
(
a0 geometry,
uid integer
)
CREATE TABLE _mcir_2347579_clusterid2
(
uid integer NOT NULL,
clusterid integer
)
In these two tables, _mcir_2347579_clusterid2.uid = mcir_2347579.uid, just
like a forign key.

The same question is how to avoid the following subquery be executed twice:
SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid

2010/10/14 Rob Sargent <robjsargent@gmail.com>

Show quoted text

On 10/14/2010 05:34 PM, sunpeng wrote:

We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;

Does that mean subsql will be executed twice? or how to write the
optimized sql?

What are you trying to discover about groups and their frequency in
tablea? Does the numberical value of groupid have any meaning in your
system?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

ineyman@perceptron.com

over 15 years ago

In reply to: sunpeng (#1)

Re: how to write an optimized sql with two same subsql?

-----Original Message-----
From: sunpeng [mailto:bluevaley@gmail.com]
Sent: Thursday, October 14, 2010 7:34 PM
To: pgsql-general@postgresql.org
Subject: how to write an optimized sql with two same subsql?

We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2 FROM
subsql as c, subsql as d WHERE d.groupid > c.groupid
and d.count > c.count;

Does that mean subsql will be executed twice? or how to write
the optimized sql?

Is that what you want:

WITH gr_counts AS (
SELECT groupid, COUNT(*) AS CNT
FROM A
GROUP BY groupid)
SELECT C.groupid AS groupid1, D.groupid AS groupid2
FROM gr_counts C, gr_counts D
WHERE D.groupid > C.groupid
AND D.count > C.count;

This will execute:

SELECT groupid, COUNT(*) AS CNT
FROM A
GROUP BY groupid

only once.

Regards,
Igor Neyman