Conditional return of aggregated data

Started by Laura Smithover 6 years ago6 messagesgeneral
Jump to latest
#1Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch

Hi,

I have some aggregated statistics which are currently being queried as follows:

create view stats_age as
SELECT a.category,
a.statcount,
b.a_desc
FROM reg_aggregated_stats a,regs_select_age b where a.category='age' and a.datapoint::smallint=b.a_val order by a.datapoint asc;

However, as these relate to event registrations, a suggestion has been made that the database really should be returning nothing until a certain number of registrations has been met (in order to avoid privacy infringing inferrence from what should be an otherwise fully irreversibly aggregated dataset).

Now, the queries are wrapped in PL/PGSQL functions anyway, so I could make a second call to Postgres to find out sum(statcount) and then conditionally return based on that.

But is there a smarter way to do this out of a single SQL query ?

My initial idea was something along the lines of :
select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

But as I soon discovered that's not valid syntax! Hence ideas welcome from those smarter than me.

Thanks !

Laura

#2Wim Bertels
wim.bertels@ucll.be
In reply to: Laura Smith (#1)
Re: Conditional return of aggregated data

Hallo Laura,

i don't know if i understand your question fully,
but this might be helpfull?: FILTER

SELECT
count(*) AS unfiltered,
count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
unfiltered | filtered
------------+----------
10 | 4
(1 row)

https://www.postgresql.org/docs/current/sql-expressions.html

hth,
Wim

Laura Smith schreef op ma 02-12-2019 om 11:10 [+0000]:

Hi,

I have some aggregated statistics which are currently being queried
as follows:

create view stats_age as
SELECT a.category,
a.statcount,
b.a_desc
FROM reg_aggregated_stats a,regs_select_age b where a.category='age'
and a.datapoint::smallint=b.a_val order by a.datapoint asc;

However, as these relate to event registrations, a suggestion has
been made that the database really should be returning nothing until
a certain number of registrations has been met (in order to avoid
privacy infringing inferrence from what should be an otherwise fully
irreversibly aggregated dataset).

Now, the queries are wrapped in PL/PGSQL functions anyway, so I could
make a second call to Postgres to find out sum(statcount) and then
conditionally return based on that.

But is there a smarter way to do this out of a single SQL query ?

My initial idea was something along the lines of :
select (select sum(statcount) from stats_residence) as
aggstat,statcount,short_name_en from stats_residence where
aggstat>some_number;

But as I soon discovered that's not valid syntax! Hence ideas welcome
from those smarter than me.

Thanks !

Laura

--
mvg,
Wim Bertels
--
Lector
UC Leuven-Limburg
--
My only love sprung from my only hate!
Too early seen unknown, and known too late!
-- William Shakespeare, "Romeo and Juliet"

#3Ravi Krishna
srkrishna@myself.com
In reply to: Laura Smith (#1)
Re: Conditional return of aggregated data

My initial idea was something along the lines of :
select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

Am I missing something basic. The above can be done using
GROUP BY and HAVING clause.

#4Ravi Krishna
srkrishna@myself.com
In reply to: Ravi Krishna (#3)
Re: Conditional return of aggregated data

My initial idea was something along the lines of :
select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

Am I missing something basic. The above can be done using
GROUP BY and HAVING clause.

or this
with t as
(select (select sum(statcount) from stats_residence) as aggstat,
statcount,short_name_en from stats_residence
)
select * from t where aggstat > some_number

Apology if I did not understand the question correctly.

#5Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch
In reply to: Ravi Krishna (#4)
Re: Conditional return of aggregated data

or this
with t as
(select (select sum(statcount) from stats_residence) as aggstat,
statcount,short_name_en from stats_residence
)
select * from t where aggstat > some_number

Apology if I did not understand the question correctly.

Hi Ravi,

Thanks for helping show me the way. You're quite right, a CTE did the trick

Laura

#6Alban Hertroys
haramrae@gmail.com
In reply to: Laura Smith (#1)
Re: Conditional return of aggregated data

On Mon, 2 Dec 2019 at 12:11, Laura Smith <n5d9xq3ti233xiyif2vp@protonmail.ch>
wrote:

My initial idea was something along the lines of :
select (select sum(statcount) from stats_residence) as
aggstat,statcount,short_name_en from stats_residence where
aggstat>some_number;

One option is to move the aggregate to the where-clause. If you also need
the value in your select-list, you can just repeat the subselect there,
usually the planner is smart enough to figure out that it can just re-use
the result.

select short_name_en from stats_residence where (select sum(statcount) from
stats_residence) >some_number;

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.