Need SQL help, I'm stuck.

Started by Chris Albertsonover 24 years ago10 messagesgeneral
Jump to latest
#1Chris Albertson
chrisalbertson90278@yahoo.com

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values. I don't even need
to know what is max(C2).

Can I join the table with itself somehow? See: "brain lock".
This should not be hard.

Thanks,

=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

#2Noname
wsheldah@lexmark.com
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

I just had to do this, and came up with two different ways; maybe someone here
will come up with an even better one.

My first brute force attempt was:
SELECT C1, C3, C4
FROM T1
WHERE C2 = ( SELECT max(T2.C2) FROM T1 as T2 WHERE T1.C1=T2.C1);

That works, but is very slow because the subselect has to be reprocessed on
every row. I tried optimizing by replacing the subselect with an indexed temp
table:
CREATE TEMP TABLE temp_MaxC2 AS SELECT C1, max(C2) AS maxC2 FROM T1 GROUP BY C1;
CREATE INDEX tmp_idx_C1 ON temp_MaxC2 (maxC2);
SELECT C1, C3, C4 FROM T1, temp_MaxC2 TMP WHERE T1.C1= TMP.C1 and
T1.C2=TMP.maxC2;

By my benchmarks, that ran roughly four orders of magnitude faster. Temp tables
go away automatically at the end of a connection, but I'm running this under
mod_perl with Apache::DBI, which pools the connections, so there's a bit more
code to drop the table and indexes before creating them, and ignoring any errors
if they don't exist in the first place. (Wish there was an "IF EXISTS ... CREATE
..." syntax)

If anyone has a still better approach, I'd love to hear what it is. Thanks,

--Wes Sheldahl

Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com> on
12/10/2001 04:42:54 PM

To: pgsql-general%postgresql.org@interlock.lexmark.com
cc: (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject: [GENERAL] Need SQL help, I'm stuck.

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values. I don't even need
to know what is max(C2).

Can I join the table with itself somehow? See: "brain lock".
This should not be hard.

Thanks,

=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

On Mon, 10 Dec 2001, Chris Albertson wrote:

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values. I don't even need
to know what is max(C2).

Can I join the table with itself somehow? See: "brain lock".
This should not be hard.

Maybe something like?
select t1.c1, c3, c4 from t1, (select c1, max(c2) as c2 from t1 group by
c1) foo where t1.c1=foo.c1 and t1.c2=foo.c2;

#4Andrew Gould
andrewgould@yahoo.com
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

Try using a subquery to identify the max(C2) value,
then join T1 to the result of the subselect by C1 and
limit the results where C2 equals max(C2).

I think the following should work; but I've never done
a subquery, so you may have to tweak the syntax:

select C1, C2, C3, C4 from T1,(select C1 as M1,
max(C2) as M2 from T1 group by M1) as T2
where T1.C1=T2.M1 and T1.C2=T2.M2;

Best of luck,

Andrew Gould

--- Chris Albertson <chrisalbertson90278@yahoo.com>
wrote:

Help. I seem to have a case of "brain lock" and
can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a
table called
T1 with columns C1, C2, C3, C4. It contains data as
follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the
values for C1, C3
and C4 that are associated with the row containing
the maximum
value of C2 for each group of like C1 values. I
don't even need
to know what is max(C2).

Can I join the table with itself somehow? See:
"brain lock".
This should not be hard.

Thanks,

=====
Chris Albertson
Home: 310-376-1029
chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189
Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to

majordomo@postgresql.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

#5Robert B. Easter
reaster@comptechnews.com
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

This seems to work:

SELECT c1,c3,c4 from (select c1, max(c2) from t1 group by c1) as dt1 (c1,c2)
natural left join t1;

Bob

Show quoted text

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values. I don't even need
to know what is max(C2).

Can I join the table with itself somehow? See: "brain lock".
This should not be hard.

Thanks,

=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

How about:

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Show quoted text

Terrorists can only take my life. Only my government can take my freedom.

#7Paul Wehr
postgresql@industrialsoftworks.com
In reply to: Chris Albertson (#1)
Re: Need SQL help, I'm stuck.

select * from t1 a
where c2=
(select max(c2) from t1 b
where a.c1=b.c1)

an index on c1 might be handy for this...

hth.

-paul

Show quoted text

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

All I can think of is

SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work. What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values. I don't even need
to know what is max(C2).

Can I join the table with itself somehow? See: "brain lock".
This should not be hard.

Thanks,

=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

---------------------------(end of
broadcast)--------------------------- TIP 1: subscribe and unsubscribe
commands go to majordomo@postgresql.org

#8Noname
wsheldah@lexmark.com
In reply to: Paul Wehr (#7)
Re: Need SQL help, I'm stuck.

In just eyeballing the various responses, it looks like the one using DISTINCT
ON manages to avoid using a subquery at all. Would this give it the edge in
performance? I had somehow never noticed the DISTINCT ON syntax before, this
looks very handy.

Also, my first attempt was to put the subquery in the WHERE clause, but I
noticed that several put the subquery in the FROM clause. Does putting it in the
FROM clause just run it once, with the results of the run joined to the outer
tables? It certainly seemed like putting the query in the WHERE clause was
running it for every row. Thanks,

Wes Sheldahl

Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001
06:33:59 PM

Please respond to Martijn van Oosterhout
<kleptog%svana.org@interlock.lexmark.com>

To: Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com>
cc: pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley
Sheldahl/Lex/Lexmark)
Subject: Re: [GENERAL] Need SQL help, I'm stuck.

On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

How about:

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

Terrorists can only take my life. Only my government can take my freedom.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#9Antonio Fiol Bonnín
fiol@w3ping.com
In reply to: Noname (#8)
Re: Need SQL help, I'm stuck.

What about this one, which also happens to give the right result?

select B.* from T1 A RIGHT JOIN T1 B on (A.C1=B.C1 AND A.C2>B.C2) where
A.c1 is null;

It is really amazing how many different ways there are to express the
same wishes in SQL...

Compared to the following ones, it is efficient:
SELECT A.* FROM T1 A WHERE NOT EXISTS (select * from T1 B where B.C2 >
A.C2 AND B.C1=A.C1);
SELECT * FROM T1 EXCEPT SELECT A.* FROM T1 A, T1 B where A.C1=B.C1 AND
A.C2<B.C2;

Though, the following is AMAZINGLY efficient. Only a seq scan, plus some
post processing.

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;

I think that distinct/order by combination is best suited for your needs.

Does anyone know of a "master source of knowledge" where one could learn
to choose an appropriate formulation for a SQL query without trying all
of the imaginable possibilities with EXPLAIN?

Thank you all!

Antonio

wsheldah@lexmark.com wrote:

Show quoted text

In just eyeballing the various responses, it looks like the one using DISTINCT
ON manages to avoid using a subquery at all. Would this give it the edge in
performance? I had somehow never noticed the DISTINCT ON syntax before, this
looks very handy.

Also, my first attempt was to put the subquery in the WHERE clause, but I
noticed that several put the subquery in the FROM clause. Does putting it in the
FROM clause just run it once, with the results of the run joined to the outer
tables? It certainly seemed like putting the query in the WHERE clause was
running it for every row. Thanks,

Wes Sheldahl

Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001
06:33:59 PM

Please respond to Martijn van Oosterhout
<kleptog%svana.org@interlock.lexmark.com>

To: Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com>
cc: pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley
Sheldahl/Lex/Lexmark)
Subject: Re: [GENERAL] Need SQL help, I'm stuck.

On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:

Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do. Let's say I have a table called
T1 with columns C1, C2, C3, C4. It contains data as follows

a 1 abcd dfg
a 2 cvfr erg
a 3 derg hbg
b 1 cccc rth
c 1 rdvg egt
c 2 derf ett

I want a SQL query that returns these rows

a 3 derg hbg
b 1 cccc rth
c 2 derf ett

How about:

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/

#10Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Noname (#8)
Re: Need SQL help, I'm stuck.

On Tue, 11 Dec 2001 wsheldah@lexmark.com wrote:

In just eyeballing the various responses, it looks like the one using DISTINCT
ON manages to avoid using a subquery at all. Would this give it the edge in
performance? I had somehow never noticed the DISTINCT ON syntax before, this
looks very handy.

Distinct on often performs better than subquery options, however it's a
PostgreSQL addition, so you need to be careful if you want to run the
query on anything else :)

Also, my first attempt was to put the subquery in the WHERE clause, but I
noticed that several put the subquery in the FROM clause. Does putting it in the
FROM clause just run it once, with the results of the run joined to the outer
tables? It certainly seemed like putting the query in the WHERE clause was
running it for every row. Thanks,

I've generally assumed that subselects in from are effectively a cursor
that gets scanned as opposed to running the entire query for each row.
I haven't looked to confirm that, but it seems reasonable :)