string_agg distinct order by

Started by Markhof, Ingolfover 4 years ago5 messagesgeneral
Jump to latest
#1Markhof, Ingolf
ingolf.markhof@de.verizon.com

I am looking for something like

string_agg(distinct col_x order by col_y)

Unfortunately, you can either have the distinct, but then the order by
needs to be identical to what's aggregated, or you can have the order be
determined by another column. Not both...

Here is the playground

Given:

create table sites (
state text,
city text,
col_a text,
col_b text
);

insert into sites values ('Texas','Dallas','green','green');
insert into sites values ('Texas','Houston','green','green');
insert into sites values ('Texas','Austin','yellow','green');
insert into sites values ('Texas','Waco','yellow','yellow');
insert into sites values ('Texas','Midland','red','red');
insert into sites values ('Texas','Amarillo','red','yellow');

For each city, there is a status denoted by colour combination, e.g. 'green
/ green'. This is stored in two different columns in the table.

There is an order in the colours:

create table colours (
colour text,
value integer
);

insert into colours values ('red', 1);
insert into colours values ('yellow', 2);
insert into colours values ('green', 3);

So, red first, yellow second, green last.

I want an aggregated view showing for each state the list of existing
status combinations, such as:

with site_status as (
select
state,
city,
col_a || '/' || col_b as status,
ca.value as val_a,
cb.value as val_b
from
sites a
join colours ca on ca.colour=a.col_a
join colours cb on cb.colour=a.col_b
)
select
state,
string_agg(distinct status,',') as list
from
site_status
group by
state
;

This results in:

Texas green/green,red/red,red/yellow,yellow/green,yellow/yellow

By using distinct in the string_agg, I avoid double entries. Fine. But now,
I want the data ordered. E.g. in the order of the first colour. I SQL, this
could read...

string_agg(distinct status,',' order by val_a) as list

but this doesn't work. I get:

SQL Error [42P10]: ERROR: in an aggregate with DISTINCT, ORDER BY
expressions must appear in argument list

So, I could say:

string_agg(distinct status,',' order by status) as list

but this is not what I want: 'green' would get first, red second, yellow
last...

I could also drop the distinct and say:

string_agg(status,',' order by val_a) as list

This would return the list in correct order, but with double values
('green/green') showing up twice.

I tried to delete the double entries via regexp_replace, but found this
doesn't work for rather long strings (where a single regexp_replace can run
many minutes!)

Any pointers?

Thank you very much for any idea.

======================================================================

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Markhof, Ingolf (#1)
Re: string_agg distinct order by

On Thu, Aug 19, 2021 at 9:09 AM Markhof, Ingolf <
ingolf.markhof@de.verizon.com> wrote:

string_agg(distinct status,',' order by status) as list

but this is not what I want: 'green' would get first, red second, yellow

last...

I could also drop the distinct and say:

string_agg(status,',' order by val_a) as list

Any pointers?

public.function_that_converts_the_array_to_a_string_while_replacing_the_val_a_values_with_their_text_equivalent(array_agg(distinct
val_a order by val_a))

Distinct is fairly simple - it sorts the data then skips over duplicates as
they appear. It needs a sort, applied implicitly or explicitly, to put the
data in order. If the sort is explicit it requires you to ensure that the
distinct field values are all grouped together (it will do this itself if
it adds an implied sort).

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markhof, Ingolf (#1)
Re: string_agg distinct order by

"Markhof, Ingolf" <ingolf.markhof@de.verizon.com> writes:

I am looking for something like
string_agg(distinct col_x order by col_y)

Unfortunately, you can either have the distinct, but then the order by
needs to be identical to what's aggregated, or you can have the order be
determined by another column. Not both...

The reason for that restriction is that the case you propose is
ill-defined. If we combine rows with the same col_x, which row's
value of col_y is to be used to sort the merged row? I think once
you answer that question, a suitable query will suggest itself.

regards, tom lane

#4Michael Lewis
mlewis@entrata.com
In reply to: Markhof, Ingolf (#1)
Re: string_agg distinct order by

I believe that you could define an enumerated type to use for those status
colors such that the ordering is defined as you like without two separate
columns for the name and sort_value or whatever.

https://www.postgresql.org/docs/current/datatype-enum.html

Example in the documentation expanded a little to demonstrate-

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

CREATE TABLE person (
name text,
current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
INSERT INTO person VALUES ('Joe', 'sad');
INSERT INTO person VALUES ('Roe', 'ok');

SELECT * FROM person order by current_mood;
SELECT * FROM person order by current_mood desc;

Note- using enum may complicate other things in your usage, so I am not
suggesting this is ideal, just one option.

*Michael Lewis | Database Engineer*
*Entrata*

#5Markhof, Ingolf
ingolf.markhof@de.verizon.com
In reply to: Tom Lane (#3)
Re: [E] Re: string_agg distinct order by

Tom, I see your point. Which is valid considering there could be any value
in col_y for some value in col_x. But in my case, col_y is a a function of
col_x, i.e. two rows with the same value in row_x will have the same value
in row_y as well.

Consider, you need to store some length values. Like this:

create table items (
id text,
len integer,
unit text
);

insert into items values (1,1,'mm');
insert into items values (2,5,'mm');
insert into items values (3,5,'mm');
insert into items values (4,1,'cm');
insert into items values (5,1,'cm');
insert into items values (6,1,'m');
insert into items values (7,1,'m');
insert into items values (7,2,'m');
insert into items values (8,2,'m');
insert into items values (9,5,'m');

With the view...

create view vu_items as
select
id,
len || unit as descr,
len*case unit when 'mm' then 1 when 'cm' then 10 when 'm' then 1000 end
as len_mm
from items;

...I now want to have a list of all distinct descr ordered by length. But...

select
string_agg(descr,',' order by len_mm)
from vu_items;

...creates a list with duplicates, only:

1mm,5mm,5mm,1cm,1cm,1m,1m,2m,2m,5m

And...

select
string_agg(distinct descr,',' order by descr)
from vu_items;

...gives a list of distinct values, but in the wrong order:

1cm,1m,1mm,2m,5m,5mm

My solution to this (now) is:

select
regexp_replace(
string_agg(descr,',' order by len_mm),
'([^,]+)(,\1)?($|,)',
'\1\3',
'g'
)
from vu_items;

Thx again for your hint in the regexp_replacy issue in my other post...

Regards,
Ingolf

On Thu, Aug 19, 2021 at 6:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Markhof, Ingolf" <ingolf.markhof@de.verizon.com> writes:

I am looking for something like
string_agg(distinct col_x order by col_y)

Unfortunately, you can either have the distinct, but then the order by
needs to be identical to what's aggregated, or you can have the order be
determined by another column. Not both...

The reason for that restriction is that the case you propose is
ill-defined. If we combine rows with the same col_x, which row's
value of col_y is to be used to sort the merged row? I think once
you answer that question, a suitable query will suggest itself.

regards, tom lane

======================================================================

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio