group by

Started by Kathy Zhuover 22 years ago8 messagesgeneral
Jump to latest
#1Kathy Zhu
Kathy.Zhu@Sun.COM

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

#2Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Kathy Zhu (#1)
Re: group by

On Fri, 3 Oct 2003, Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

First thought was caching: the disk blocks are cached in memory after the first
qery so all the others just fetch from there.

Second thought: well I haven't really had it but indexes and sorting is sort of
sloshing around in my mind. Not sure how that applies to here since I can't
remember if the first would require the sort step having retrieved all the
tuples and the others would just use the index pages.

It is Friday though.

--
Nigel J. Andrews

#3Dennis Gearon
gearond@fireserve.net
In reply to: Kathy Zhu (#1)
Re: group by

When was the last time you vacuumed full?

Kathy Zhu wrote:

Show quoted text

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#4Kris Jurka
books@ejurka.com
In reply to: Kathy Zhu (#1)
Re: group by

On Fri, 3 Oct 2003, Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

I believe Solaris's qsort implementation (which is used for the group by)
has problems dealing with large numbers of similar values.

I think in later versions of pg our own qsort is used.

Kris Jurka

#5Kathy Zhu
Kathy.Zhu@Sun.COM
In reply to: Kris Jurka (#4)
Re: group by

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there is a
large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

thanks for the all the replies though,
kathy

Date: Fri, 03 Oct 2003 15:53:02 -0700
From: Dennis Gearon <gearond@fireserve.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4)

Gecko/20030624

Show quoted text

X-Accept-Language: en-us, ru, es-mx
To: Kathy Zhu <Kathy.Zhu@sun.com>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] group by
Content-Transfer-Encoding: 7bit

When was the last time you vacuumed full?

Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#6Bruno Wolff III
bruno@wolff.to
In reply to: Kathy Zhu (#5)
Re: group by

On Fri, Oct 03, 2003 at 17:32:22 -0600,
Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there is a
large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

In 7.4 you will probably find the results more compatible.
For 7.3 and lower, the first case will use a sort to do a group by.
No sort will be done in the second example.
In 7.4 the groub by will use the new hash aggregate method and will
probably be significantly faster than the second way of doing things.

#7Kathy Zhu
Kathy.Zhu@Sun.COM
In reply to: Bruno Wolff III (#6)
Re: group by

That was also one of my guesses why groupby takes longer, although it is not
mentioned in the doc.
thanks !!!
kathy

Date: Sat, 4 Oct 2003 11:23:48 -0500
From: Bruno Wolff III <bruno@wolff.to>
To: Kathy Zhu <Kathy.Zhu@sun.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] group by
Mail-Followup-To: Kathy Zhu <Kathy.Zhu@Sun.COM>, pgsql-general@postgresql.org
Content-Disposition: inline
User-Agent: Mutt/1.5.4i

On Fri, Oct 03, 2003 at 17:32:22 -0600,
Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there

is a

Show quoted text

large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

In 7.4 you will probably find the results more compatible.
For 7.3 and lower, the first case will use a sort to do a group by.
No sort will be done in the second example.
In 7.4 the groub by will use the new hash aggregate method and will
probably be significantly faster than the second way of doing things.

#8Bruno Wolff III
bruno@wolff.to
In reply to: Kathy Zhu (#7)
Re: group by

On Mon, Oct 06, 2003 at 09:23:05 -0600,
Kathy Zhu <Kathy.Zhu@sun.com> wrote:

That was also one of my guesses why groupby takes longer, although it is not
mentioned in the doc.

If you want to know how a query is being done, you can use explain
(or explain analyze if you want timing information) to see the
plan.