count (DISTINCT field) OVER ()

Started by Tarlika Elisabeth Schmitzover 14 years ago5 messagesgeneral
Jump to latest
#1Tarlika Elisabeth Schmitz
postgresql6@numerixtechnology.de

I would like to implement the equivalent of "count (DISTINCT field) OVER
()":

SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
count (id) OVER() AS cnt
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
ORDER BY delta DESC

produces result:
1787 Toomyvara 0.5 4
1787 Toomevara 0.4 4
1700 Ardcroney 0.105 4
1788 Townsfield 0.1 4

What I would like is a "3" in the cnt column (ignoring id duplicates).

How can I achieve this?

The best I can come up with is:

SELECT id, name,delta, count (*) OVER()
FROM (
SELECT DISTINCT ON (id)
id, name, similarity(name, 'Tooneyvara') as delta
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
ORDER BY id, delta DESC
) AS x
ORDER by delta DESC

--

Best Regards,
Tarlika Elisabeth Schmitz

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Tarlika Elisabeth Schmitz (#1)
Re: count (DISTINCT field) OVER ()

On Nov 9, 2011, at 18:52, Tarlika Elisabeth Schmitz <postgresql6@numerixtechnology.de> wrote:

I would like to implement the equivalent of "count (DISTINCT field) OVER
()":

SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
count (id) OVER() AS cnt
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
ORDER BY delta DESC

produces result:
1787 Toomyvara 0.5 4
1787 Toomevara 0.4 4
1700 Ardcroney 0.105 4
1788 Townsfield 0.1 4

What I would like is a "3" in the cnt column (ignoring id duplicates).

How can I achieve this?

The best I can come up with is:

SELECT id, name,delta, count (*) OVER()
FROM (
SELECT DISTINCT ON (id)
id, name, similarity(name, 'Tooneyvara') as delta
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
ORDER BY id, delta DESC
) AS x
ORDER by delta DESC

--

Best Regards,
Tarlika Elisabeth Schmitz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1. Write your main query in a WITH (CTE)
2. Query #1 with appropriate GROUP BY clause (CTE)
3. In the main statement JOIN 1 and 2

David J.

#3Thomas Kellerer
spam_eater@gmx.net
In reply to: Tarlika Elisabeth Schmitz (#1)
Re: count (DISTINCT field) OVER ()

Tarlika Elisabeth Schmitz, 10.11.2011 00:52:

I would like to implement the equivalent of "count (DISTINCT field) OVER ()":

SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
count (id) OVER() AS cnt
FROM vtown
WHERE
similarity(name, 'Tooneyvara')> 0.1
ORDER BY delta DESC

produces result:
1787 Toomyvara 0.5 4
1787 Toomevara 0.4 4
1700 Ardcroney 0.105 4
1788 Townsfield 0.1 4

What I would like is a "3" in the cnt column (ignoring id duplicates).

This should do it:

SELECT id,
name,
delta,
sum(case when rn = 1 then rn else null end) over() as distinct_id_count
FROM (
SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
row_number() OVER(partition by id) AS rn
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
) t
ORDER BY delta DESC

#4Tarlika Elisabeth Schmitz
postgresql6@numerixtechnology.de
In reply to: Thomas Kellerer (#3)
Re: count (DISTINCT field) OVER ()

On Thu, 10 Nov 2011 10:02:36 +0100
Thomas Kellerer <spam_eater@gmx.net> wrote:

Tarlika Elisabeth Schmitz, 10.11.2011 00:52:

I would like to implement the equivalent of "count (DISTINCT id)
OVER ()":

[...]

produces result:
id, name, delta, cnt
1787 Toomyvara 0.5 4
1787 Toomevara 0.4 4
1700 Ardcroney 0.105 4
1788 Townsfield 0.1 4

This should do it:

SELECT id,
name,
delta,
sum(case when rn = 1 then rn else null end) over() as
distinct_id_count
FROM (
SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
row_number() OVER(partition by id) AS rn
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
) t
ORDER BY delta DESC

I like you suggestion, Thomas. It is not that dissimilar from but
cleaner than my original SELECT ... FROM (SELECT DISTINCT ON(id)
attempt.
It's also very slightly faster.

Here's another, slightly shorter, variation of your suggestion:

SELECT id, name, delta,
max(rank) OVER() as cnt
FROM (
SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
dense_rank() OVER(ORDER BY id) AS rank
FROM vtown
WHERE
similarity(name, 'Tooneyvara') > 0.1
) t
ORDER BY delta DESC

#5Thomas Kellerer
spam_eater@gmx.net
In reply to: Tarlika Elisabeth Schmitz (#4)
Re: count (DISTINCT field) OVER ()

Tarlika Elisabeth Schmitz, 10.11.2011 11:24:

SELECT id,
name,
delta,
sum(case when rn = 1 then rn else null end) over() as
distinct_id_count
FROM (
SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
row_number() OVER(partition by id) AS rn
FROM vtown
WHERE
similarity(name, 'Tooneyvara')> 0.1
) t
ORDER BY delta DESC

I like you suggestion, Thomas. It is not that dissimilar from but
cleaner than my original SELECT ... FROM (SELECT DISTINCT ON(id)
attempt.
It's also very slightly faster.

Here's another, slightly shorter, variation of your suggestion:

SELECT id, name, delta,
max(rank) OVER() as cnt
FROM (
SELECT
id, name, similarity(name, 'Tooneyvara') as delta,
dense_rank() OVER(ORDER BY id) AS rank
FROM vtown
WHERE
similarity(name, 'Tooneyvara')> 0.1
) t
ORDER BY delta DESC

Nice trick with the dense_rank(), never thought of that.

Regards
Thomas