GROUP BY or alternative means to group

Started by Alexander Reichstadtabout 14 years ago12 messagesgeneral
Jump to latest

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

#2Michael Gould
mgould@isstrucksoftware.net
In reply to: Alexander Reichstadt (#1)
Re: GROUP BY or alternative means to group

You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard it is a pain to list all columns even if you don't need them

Best Regards

Michael Gould

Sent from Samsung mobile

Alexander Reichstadt <lxr@mac.com> wrote:

Show quoted text

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

#3Bartosz Dmytrak
bdmytrak@eranet.pl
In reply to: Alexander Reichstadt (#1)
Re: GROUP BY or alternative means to group

Hi,
You can use one of windowing function:
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
http://www.postgresql.org/docs/9.1/static/functions-window.html
this could be rank() in subquery or first_value(vale any), but there could
be performance issue

another solution could be boolean flag "default" in table address_reference
which should be unique for single company, I mean value true should be
unique - this could be reached by unique partial index on column*
*refid_companies
with condition default = true
http://www.postgresql.org/docs/9.1/static/indexes-partial.html#INDEXES-PARTIAL-EX3

hope Your pg version supports windowing functions (as I remember 8.4 and
above)

Of course there is a solution with subquery which finds min id in table
addresses of each refid_companies in table addresses_reference and this
subquery is joined with companies table, but I am afraid this is not the
best one.

Regards,
Bartek

2012/3/12 Alexander Reichstadt <lxr@mac.com>

Show quoted text

Hi,

the following statement worked on mysql but gives me an error on postgres:

*column "addresses.address1" must appear in the GROUP BY clause or be
used in an aggregate function*

I guess I am doing something wrong. I read the web answers, but none of
them seem to meet my needs:

*SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies
LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.idGROUP BY
companies.id;*

What I did now was create a view based on above statement but without
grouping. This returns a list with non-distinct values for all companies
that have more than one address, which is correct. But in some cases I only
need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me
the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

In reply to: Alexander Reichstadt (#1)
Solved [Re: GROUP BY or alternative means to group]

So the mysql way for group by seems to be non-standard.

What works for postgres is the DISTINCT ON (fieldname) approach.

Thanks

Am 12.03.2012 um 20:35 schrieb Alexander Reichstadt:

Show quoted text

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

#5Kiriakos Georgiou
kg.postgresql@olympiakos.com
In reply to: Alexander Reichstadt (#1)
Re: GROUP BY or alternative means to group

Instead of the joins you can use a subquery to get the first address.

Or you can do the joins without the group by and use row_number() over(partition by companies.id) on the select list to label each company address with a number starting at 1. You can just keep rows that have row_number = 1. See http://www.postgresql.org/docs/current/static/tutorial-window.html to get the feeling how window functions work.

Kiriakos

On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote:

Show quoted text

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

In reply to: Bartosz Dmytrak (#3)
Re: GROUP BY or alternative means to group

Thanks, I just posted my response to my own question for the archives. I take it also that group by is faster than distinct on. If it is a substantial performance gain I have to work on this some more. A subquery I would expect would be much of a drag, so for all keystroke-updated list-tables this would not be suitable I think.

Am 12.03.2012 um 21:57 schrieb Bartosz Dmytrak:

Show quoted text

Hi,
You can use one of windowing function:
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
http://www.postgresql.org/docs/9.1/static/functions-window.html
this could be rank() in subquery or first_value(vale any), but there could be performance issue

another solution could be boolean flag "default" in table address_reference which should be unique for single company, I mean value true should be unique - this could be reached by unique partial index on column refid_companies with condition default = true
http://www.postgresql.org/docs/9.1/static/indexes-partial.html#INDEXES-PARTIAL-EX3

hope Your pg version supports windowing functions (as I remember 8.4 and above)

Of course there is a solution with subquery which finds min id in table addresses of each refid_companies in table addresses_reference and this subquery is joined with companies table, but I am afraid this is not the best one.

Regards,
Bartek

2012/3/12 Alexander Reichstadt <lxr@mac.com>
Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Alexander Reichstadt (#1)
Re: GROUP BY or alternative means to group

On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote:

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in
an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them
seem to meet my needs:

SELECT
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
FROM companies JOIN addresses_reference ON
companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without
grouping. This returns a list with non-distinct values for all companies
that have more than one address, which is correct. But in some cases I only
need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the
first stored addresses related, and disregard any further addresses.

Is there any way to do this?

If you don't care which address you get, you can use max(address) or
min(address).

In reply to: Scott Marlowe (#7)
Re: GROUP BY or alternative means to group

I guess I lack the knowledge to integrate your answer in my query....Actually I'd prefer to always see the first address entered unless there is a where-clause added. Not sure how this works out then and haven't tested. But given the initial query extended by distinct on it would be like so:

SELECT distinct on (companies.id)
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
FROM companies JOIN addresses_reference ON
companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
addresses_reference.refid_addresses=addresses.id GROUP BY companies.id

where addresses.city ILIKE '%bla%'

I guess postgres would make sense and deliver the entry with bla with the where-clause, and disregard the bla entry returning random addresses associated with the company without the where-clause.

But where would I insert the max(address) piece?

Am 12.03.2012 um 22:09 schrieb Scott Marlowe:

Show quoted text

On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote:

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in
an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them
seem to meet my needs:

SELECT
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
FROM companies JOIN addresses_reference ON
companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;

What I did now was create a view based on above statement but without
grouping. This returns a list with non-distinct values for all companies
that have more than one address, which is correct. But in some cases I only
need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the
first stored addresses related, and disregard any further addresses.

Is there any way to do this?

If you don't care which address you get, you can use max(address) or
min(address).

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Scott Marlowe
scott.marlowe@gmail.com
In reply to: Alexander Reichstadt (#8)
Re: GROUP BY or alternative means to group

On Mon, Mar 12, 2012 at 3:19 PM, Alexander Reichstadt <lxr@mac.com> wrote:

But where would I insert the max(address) piece?

Just put max() or min() around any field in the select list that's not
in the group by clause

#10Bruno Wolff III
bruno@wolff.to
In reply to: Michael Gould (#2)
Re: GROUP BY or alternative means to group

On Mon, Mar 12, 2012 at 16:18:05 -0400,
Michael Gould <mgould@isstrucksoftware.net> wrote:

You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard it is a pain to list all columns even if you don't need them

In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.

SELECT games.gameid, games.title
FROM games, crate
WHERE
games.gameid = crate.gameid
AND
games.contact = 'BOB'
AND
crate.touched >= current_timestamp + '4 year ago'
GROUP BY games.gameid
HAVING count(1) < 30
ORDER BY games.gameid
;

#11Michael Gould
mgould@isstrucksoftware.net
In reply to: Bruno Wolff III (#10)
Re: GROUP BY or alternative means to group

Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's

Regards

Mike Gould

From my Samsung Android tablet on T-Mobile. The first nationwide 4G networkBruno Wolff III <bruno@wolff.to> wrote:On Mon, Mar 12, 2012 at 16:18:05 -0400,
   Michael Gould <mgould@isstrucksoftware.net> wrote:

You need to include all columns that are not aggregrative columns in the group by.  Even though that is the standard it is a pain to list all columns even if you don't need them

In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.

SELECT games.gameid, games.title
   FROM games, crate
     WHERE
       games.gameid = crate.gameid
       AND
       games.contact = 'BOB'
       AND
       crate.touched >= current_timestamp + '4 year ago'
   GROUP BY games.gameid
   HAVING count(1) < 30
   ORDER BY games.gameid
;

#12Bruno Wolff III
bruno@wolff.to
In reply to: Michael Gould (#11)
Re: GROUP BY or alternative means to group

On Mon, Apr 09, 2012 at 13:55:04 -0400,
Michael Gould <mgould@isstrucksoftware.net> wrote:

Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's

This is what the 9.1 documentation says:
"When GROUP BY is present, it is not valid for the SELECT list expressions to
refer to ungrouped columns except within aggregate functions or if the
ungrouped column is functionally dependent on the grouped columns, since
there would otherwise be more than one possible value to return for an
ungrouped column. A functional dependency exists if the grouped columns (or
a subset thereof) are the primary key of the table containing the ungrouped
column."

That implies you need to group by a primary key. I haven't tested if
that (other keys can't provide this) is actually the case.