group by

Started by Kathy Zhuover 22 years ago8 messagesgeneral

Jump to latest

Kathy Zhu

Kathy.Zhu@Sun.COM

over 22 years ago

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

Nigel J. Andrews

nandrews@investsystems.co.uk

over 22 years ago

In reply to: Kathy Zhu (#1)

Re: group by

On Fri, 3 Oct 2003, Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

First thought was caching: the disk blocks are cached in memory after the first
qery so all the others just fetch from there.

Second thought: well I haven't really had it but indexes and sorting is sort of
sloshing around in my mind. Not sure how that applies to here since I can't
remember if the first would require the sort step having retrieved all the
tuples and the others would just use the index pages.

It is Friday though.

--
Nigel J. Andrews

Dennis Gearon

gearond@fireserve.net

over 22 years ago

In reply to: Kathy Zhu (#1)

Re: group by

When was the last time you vacuumed full?

Kathy Zhu wrote:

Show quoted text

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Kris Jurka

books@ejurka.com

over 22 years ago

In reply to: Kathy Zhu (#1)

Re: group by

On Fri, 3 Oct 2003, Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

I believe Solaris's qsort implementation (which is used for the group by)
has problems dealing with large numbers of similar values.

I think in later versions of pg our own qsort is used.

Kris Jurka

Kathy Zhu

Kathy.Zhu@Sun.COM

over 22 years ago

In reply to: Kris Jurka (#4)

Re: group by

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there is a
large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

thanks for the all the replies though,
kathy

Date: Fri, 03 Oct 2003 15:53:02 -0700
From: Dennis Gearon <gearond@fireserve.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4)

Gecko/20030624

Show quoted text

X-Accept-Language: en-us, ru, es-mx
To: Kathy Zhu <Kathy.Zhu@sun.com>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] group by
Content-Transfer-Encoding: 7bit

When was the last time you vacuumed full?

Kathy Zhu wrote:

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Import Notes

Resolved by subject fallback

Bruno Wolff III

bruno@wolff.to

over 22 years ago

In reply to: Kathy Zhu (#5)

Re: group by

On Fri, Oct 03, 2003 at 17:32:22 -0600,
Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there is a
large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

In 7.4 you will probably find the results more compatible.
For 7.3 and lower, the first case will use a sort to do a group by.
No sort will be done in the second example.
In 7.4 the groub by will use the new hash aggregate method and will
probably be significantly faster than the second way of doing things.

Kathy Zhu

Kathy.Zhu@Sun.COM

over 22 years ago

In reply to: Bruno Wolff III (#6)

Re: group by

That was also one of my guesses why groupby takes longer, although it is not
mentioned in the doc.
thanks !!!
kathy

Date: Sat, 4 Oct 2003 11:23:48 -0500
From: Bruno Wolff III <bruno@wolff.to>
To: Kathy Zhu <Kathy.Zhu@sun.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] group by
Mail-Followup-To: Kathy Zhu <Kathy.Zhu@Sun.COM>, pgsql-general@postgresql.org
Content-Disposition: inline
User-Agent: Mutt/1.5.4i

On Fri, Oct 03, 2003 at 17:32:22 -0600,
Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there

is a

Show quoted text

large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

In 7.4 you will probably find the results more compatible.
For 7.3 and lower, the first case will use a sort to do a group by.
No sort will be done in the second example.
In 7.4 the groub by will use the new hash aggregate method and will
probably be significantly faster than the second way of doing things.

Import Notes

Resolved by subject fallback

Bruno Wolff III

bruno@wolff.to

over 22 years ago

In reply to: Kathy Zhu (#7)

Re: group by

On Mon, Oct 06, 2003 at 09:23:05 -0600,
Kathy Zhu <Kathy.Zhu@sun.com> wrote:

That was also one of my guesses why groupby takes longer, although it is not
mentioned in the doc.

If you want to know how a query is being done, you can use explain
(or explain analyze if you want timing information) to see the
plan.