an difficult SQL

Started by Rafal Pietrakover 3 years ago5 messagesgeneral
Jump to latest
#1Rafal Pietrak
rafal@ztk-rp.eu

Hi Everybody,

I was wondering if anybody here could help me cook up a query:

1. against a list of events (like an activity log in the database). The
list is a single table: create table events (tm timestamp, user int,
description text).

2. of which the output would be sorted in such a way, that:
2.1 most recent event would "select" most recent events of that same
user, and displayed in a group (of say 10) of them (in "tm" order).

2.2 going through the events back in time, first event of ANOTHER user
selects next group, where (say 10) most recent events of that OTHER user
is presented.

2.3 next most recent event of yet another user selects yet another group
to display and this selection process goes on, up to a maximum of (say
20) users/groups-of-their-events.

2.4 after that, all other events are selected in tm order.

This is to present most recent telephone activities grouped by most
recent subscribers so that the dashboard doesn't get cluttered with
information but allows for an overview of other activity of most recent
users.

I tend to think, that it's a problem for a window function ... but I've
stumbled on the problem how to limit the window "frame" to just a few
(say 10) events within the "window" and have all the rest returned as
"tail" of the query.

BTW: the eventlog table is big. (and partitioned).

Any help appreciated.

-R

#2Thiemo Kellner
thiemo@gelassene-pferde.biz
In reply to: Rafal Pietrak (#1)
Re: an difficult SQL

Hi Rafal

You first could select the three users with the most recent entries with
a windowing function
(https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)
putting it into a with query
(https://www.postgresql.org/docs/15/sql-select.html), in following with
queries I would select 2.1 to 2.3 with each a constant column with each
a different value you later sort by. In a next with query you can select
all the rest (except all
https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the
results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a
last with query you can put together the partial results for 2.1 to 2.4
with a union all
(https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and
selecting sort by the sort column and the timestamp in the final select.

I do not know your background, however, sql is about data sets end it is
not always easy to get ones head around thinking in sets. I hope you
could follow my suggestions. It might not be the most efficient way but
should work.

Kind regards

Thiemo

Am 05.11.22 um 16:10 schrieb Rafal Pietrak:

Hi Everybody,

I was wondering if anybody here could help me cook up a query:

1. against a list of events (like an activity log in the database).
The list is a single table: create table events (tm timestamp, user
int, description text).

2. of which the output would be sorted in such a way, that:
2.1 most recent event would "select" most recent events of that same
user, and displayed in a group (of say 10) of them (in "tm" order).

2.2 going through the events back in time, first event of ANOTHER user
selects next group, where (say 10) most recent events of that OTHER
user is presented.

2.3 next most recent event of yet another user selects yet another
group to display and this selection process goes on, up to a maximum
of (say 20) users/groups-of-their-events.

2.4 after that, all other events are selected in tm order.

This is to present most recent telephone activities grouped by most
recent subscribers so that the dashboard doesn't get cluttered with
information but allows for an overview of other activity of most
recent users.

I tend to think, that it's a problem for a window function ... but
I've stumbled on the problem how to limit the window "frame" to just a
few (say 10) events within the "window" and have all the rest returned
as "tail" of the query.

BTW: the eventlog table is big. (and partitioned).

Any help appreciated.

-R

--
Signal (Safer than WhatsApp): +49 1578 7723737
Threema (Safer than WhatsApp): A76MKH3J
Handy: +49 1578 772 37 37

#3Rafal Pietrak
rafal@ztk-rp.eu
In reply to: Thiemo Kellner (#2)
Re: an difficult SQL

Hi Thiemo,

Thank you for suggestions.

W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:

Hi Rafal

You first could select the three users with the most recent entries with
a windowing function
(https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)

surely I'm missing something crucial here:
select row_number() over w,* from eventlog where row_number() over w < 5
window w as (partition by user);
ERROR: window functions are not allowed in WHERE

So I'm unable to pick a limited number of rows within the user
"group-window" ranges.

Without that, I cannot proceed.

Any suggestions?

-R
PS: regarding "my sets background", yes I do "think in sets" ... as
opposed to thinking "in functions" (like iterating procedures). I do
prefer solutions based on set definitions.

putting it into a with query
(https://www.postgresql.org/docs/15/sql-select.html), in following with
queries I would select 2.1 to 2.3 with each a constant column with each
a different value you later sort by. In a next with query you can select
all the rest (except all
https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the
results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a
last with query you can put together the partial results for 2.1 to 2.4
with a union all
(https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and
selecting sort by the sort column and the timestamp in the final select.

Show quoted text

I do not know your background, however, sql is about data sets end it is
not always easy to get ones head around thinking in sets. I hope you
could follow my suggestions. It might not be the most efficient way but
should work.

Kind regards

Thiemo

Am 05.11.22 um 16:10 schrieb Rafal Pietrak:

Hi Everybody,

I was wondering if anybody here could help me cook up a query:

1. against a list of events (like an activity log in the database).
The list is a single table: create table events (tm timestamp, user
int, description text).

2. of which the output would be sorted in such a way, that:
2.1 most recent event would "select" most recent events of that same
user, and displayed in a group (of say 10) of them (in "tm" order).

2.2 going through the events back in time, first event of ANOTHER user
selects next group, where (say 10) most recent events of that OTHER
user is presented.

2.3 next most recent event of yet another user selects yet another
group to display and this selection process goes on, up to a maximum
of (say 20) users/groups-of-their-events.

2.4 after that, all other events are selected in tm order.

This is to present most recent telephone activities grouped by most
recent subscribers so that the dashboard doesn't get cluttered with
information but allows for an overview of other activity of most
recent users.

I tend to think, that it's a problem for a window function ... but
I've stumbled on the problem how to limit the window "frame" to just a
few (say 10) events within the "window" and have all the rest returned
as "tail" of the query.

BTW: the eventlog table is big. (and partitioned).

Any help appreciated.

-R

#4Erik Wienhold
ewie@ewie.name
In reply to: Rafal Pietrak (#3)
Re: an difficult SQL

On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote:

W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:

You first could select the three users with the most recent entries with
a windowing function
(https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)

surely I'm missing something crucial here:
select row_number() over w,* from eventlog where row_number() over w < 5
window w as (partition by user);
ERROR: window functions are not allowed in WHERE

So I'm unable to pick a limited number of rows within the user
"group-window" ranges.

Without that, I cannot proceed.

Any suggestions?

Windows functions are only permitted in SELECT and ORDER BY because they are
executed after WHERE, GROUP BY, and HAVING[1]https://www.postgresql.org/docs/15/tutorial-window.html.

You need a derived table to filter on row_number:

with
ranked as (
select *, row_number() over w
from eventlog
window w as (partition by user)
)
select *
from ranked
where row_number < 5;

[1]: https://www.postgresql.org/docs/15/tutorial-window.html

--
Erik

#5Rafal Pietrak
rafal@ztk-rp.eu
In reply to: Erik Wienhold (#4)
Re: an difficult SQL

Great, with a little tweaking (to get the remaining rows ordered
correctly), this did the job.

Thank you Erik.

BR

-R

W dniu 6.11.2022 o 15:23, Erik Wienhold pisze:

Show quoted text

On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote:

W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze:

You first could select the three users with the most recent entries with
a windowing function
(https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS)

surely I'm missing something crucial here:
select row_number() over w,* from eventlog where row_number() over w < 5
window w as (partition by user);
ERROR: window functions are not allowed in WHERE

So I'm unable to pick a limited number of rows within the user
"group-window" ranges.

Without that, I cannot proceed.

Any suggestions?

Windows functions are only permitted in SELECT and ORDER BY because they are
executed after WHERE, GROUP BY, and HAVING[1].

You need a derived table to filter on row_number:

with
ranked as (
select *, row_number() over w
from eventlog
window w as (partition by user)
)
select *
from ranked
where row_number < 5;

[1] https://www.postgresql.org/docs/15/tutorial-window.html

--
Erik