DISTINCT in STRING_AGG

Started by Sterpu Victorover 10 years ago7 messagesgeneral
Jump to latest
#1Sterpu Victor
victor@caido.ro

Hello

Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY
aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON
(aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND
aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON
(aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND
aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND
aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON
(aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND
aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;

And "childen" contain doubles. The result is:
id ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"

There are doubles because of the join aqjs3 witch is producing this
problem.
Can I make it so the children ID's are unique?

Thank you.

#2John Turner
fenwayriffs@gmail.com
In reply to: Sterpu Victor (#1)
Re: DISTINCT in STRING_AGG

On Nov 28, 2015, at 1:35 PM, Sterpu Victor <victor@caido.ro> wrote:

Hello

Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;

And "childen" contain doubles. The result is:
id ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"

There are doubles because of the join aqjs3 witch is producing this problem.Can I make it so the children ID's are unique?

Just to mention, this looks like a good candidate for range types and CTE’s.

The > / < comparisons appear to be mutually exclusive in each LEFT JOIN clause, so it’s not apparent why aqjs3 is causing duplication, as you’ve stated.

As far as I can see, without providing us with your table constraints/keys, there’s no way to determine what makes your ID values unique…

However, if you defer your STRING_AGG until after you derive a distinct “staging” result set from the joins, then you can effect uniqueness -
e.g. (air code):
WITH q AS
(SELECT aq.id aq_parent_id, atjs.id atjs_parent_id, CAST(aqjs1.id AS VARCHAR) child FROM ... GROUP BY aq.id, atjs.id, aqjs1.id)
SELECT atjs_parent_id, STRING_AGG(child,’,’ ORDER BY aqjs.to_left) children
FROM q LEFT JOIN (SELECT DISTINCT id_ad_query, to_left FROM administration.ad_query_join_select) aqjs ON …
GROUP BY aq_parent_id, parent
ORDER BY aq_parent_id, atjs.to_left;

Something along these lines ‘may’ produce a unique set of child values for each id by which to perform a STRING_AGG on, but again, I can only guess based on the lack of definition provided for your table constraints.

John

#3Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Sterpu Victor (#1)
Re: DISTINCT in STRING_AGG

On 28 November 2015 at 18:35, Sterpu Victor <victor@caido.ro> wrote:

Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY
aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query
= atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND
aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query
= atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND
aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND
aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query
= atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND
aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;

And "childen" contain doubles. The result is:
*id ; children*
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"

There are doubles because of the join aqjs3 witch is producing this
problem.
Can I make it so the children ID's are unique?

Well if you can live with losing the to_left ordering, then you could just
do

SELECT STRING_AGG(DISTINCT CAST(aqjs1.id​ AS VARCHAR), '') AS children

​ ...​

​no?

Geoff​

#4Sterpu Victor
victor@caido.ro
In reply to: Geoff Winkless (#3)
Re: DISTINCT in STRING_AGG

I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:

SELECT atjs.id, tmp.children AS children
FROM administration.ad_query_join_select atjs

JOIN (SELECT
atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY
aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
LEFT JOIN administration.ad_query_join_select aqjs1 ON
(aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left
AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON
(aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left
AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND
aqjs2.to_right>aqjs1.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY atjs.id) tmp ON (tmp.id = atjs.id)

JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs3 ON
(aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left
AND aqjs3.to_right>atjs.to_right)
WHERE atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id, tmp.children
ORDER BY aq.id ASC, atjs.to_left ASC;

Result is:
id ; children
1399029;"1399031"
1399031;"1399032,1399033"
Is there a better way? I usualy try to avoid subqueries.

------ Original Message ------
From: "Geoff Winkless" <pgsqladmin@geoff.dj>
To: "Sterpu Victor" <victor@caido.ro>
Cc: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: 11/29/2015 6:42:18 PM
Subject: Re: [GENERAL] DISTINCT in STRING_AGG

Show quoted text

On 28 November 2015 at 18:35, Sterpu Victor <victor@caido.ro> wrote:

Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY
aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON
(aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left
AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON
(aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left
AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND
aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON
(aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left
AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;

And "childen" contain doubles. The result is:
id ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"

There are doubles because of the join aqjs3 witch is producing this
problem.
Can I make it so the children ID's are unique?

Well if you can live with losing the to_left ordering, then you could
just do

SELECT STRING_AGG(DISTINCT CAST(aqjs1.id​ AS VARCHAR), '') AS children

​ ...​

​no?

Geoff​

#5Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Sterpu Victor (#4)
Re: DISTINCT in STRING_AGG

On 29 November 2015 at 18:59, Sterpu Victor <victor@caido.ro> wrote:

I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:

​Well you could look at the intarray ​extension and a combination of
array_agg, uniq() and string_to_array:

http://www.postgresql.org/docs/current/static/intarray.html

but that's probably sledgehammer:nut time.

Not sure why you need to order the values you're getting back by something
other than the values themselves - is there a reason you wouldn't want the
"children"​ set to be ordered numerically? You can still order the outer
query by whatever you like, or you can order the aggregate by the values
themselves, it's just the DISTINCT inside the aggregate query requires that
an internal ORDER includes the ordering term in the result.

​Geoff​

#6Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Geoff Winkless (#5)
Re: DISTINCT in STRING_AGG

On 29 November 2015 at 20:51,
​I
wrote:

Well you could look at the intarray ​extension and a combination of
array_agg, uniq() and
*​​string_to_array*:

​Mind blip, apologies, obviously I meant array_to_string :)​

​Geoff​

#7Sterpu Victor
victor@caido.ro
In reply to: Geoff Winkless (#5)
Re: DISTINCT in STRING_AGG

Thank you.

I think there is no native function that will solve this problem,
intarray extension ​can't order as in the example query.
I could write a new SQL function but I solved the problem with
subqueries and works fine.

I need order because I use nested trees.
To_left and to_right are the coordinates for each node and the result
must be ordered by this to obtain the correct result.

------ Original Message ------
From: "Geoff Winkless" <pgsqladmin@geoff.dj>
To: "Sterpu Victor" <victor@caido.ro>
Cc: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: 11/29/2015 10:51:32 PM
Subject: Re: Re[2]: [GENERAL] DISTINCT in STRING_AGG

Show quoted text

On 29 November 2015 at 18:59, Sterpu Victor <victor@caido.ro> wrote:

I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:

​Well you could look at the intarray ​extension and a combination of
array_agg, uniq() and string_to_array:

http://www.postgresql.org/docs/current/static/intarray.html

but that's probably sledgehammer:nut time.

Not sure why you need to order the values you're getting back by
something other than the values themselves - is there a reason you
wouldn't want the "children"​ set to be ordered numerically? You can
still order the outer query by whatever you like, or you can order the
aggregate by the values themselves, it's just the DISTINCT inside the
aggregate query requires that an internal ORDER includes the ordering
term in the result.

​Geoff​