Record last SELECT on a row?

Started by Matthias Leisi4 months ago15 messagesgeneral
Jump to latest
#1Matthias Leisi
matthias@leisi.net

An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in that table were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows which have not been accessed in a certain period of time.

The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almost all accesses will result in zero, one or very few records returned. Given the modest size of the table, performance considerations are not top priority.

If we had full control over the application, we could eg use a function to select the records and then update some „last read“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full control over the database, so we could put some other „object“ in lieu of the direct table.

Any other ways this could be achieved?

Thanks,
Matthias

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Matthias Leisi (#1)
Re: Record last SELECT on a row?

On Wed, 2025-12-17 at 08:40 +0100, Matthias Leisi wrote:

An application (which we can’t change) is accessing some Postgres table, and we would
like to record when the rows in that table were last read (meaning: appeared in a
SELECT result). The ultimate goal would be that we can „age out“ rows which have not
been accessed in a certain period of time.

If we had full control over the application, we could eg use a function to select the
records and then update some „last read“ column. But since we don’t control the
application, that’s not an option. On the other hand, we have full control over the
database, so we could put some other „object“ in lieu of the direct table.

Any other ways this could be achieved?

I don't think that is possible. You could log all statements, but that won't show
which rows are accessed.

Yours,
Laurenz Albe

#3Greg Sabino Mullane
greg@turnstep.com
In reply to: Matthias Leisi (#1)
Re: Record last SELECT on a row?

On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:

The table contains some ten thousand rows, five columns, and we already
record created / last updated using triggers.

...

Any other ways this could be achieved?

It depends a lot on how the table is accessed, but you could use a function
(or group of functions) that returns the information from the table, and
along the way, updates the requested metadata.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Matthias Leisi (#1)
Re: Record last SELECT on a row?

On Wednesday, December 17, 2025, Matthias Leisi <matthias@leisi.net> wrote:

Any other ways this could be achieved?

Without the ability to modify the application which uses a direct database
connection, you’d need to modify the server code or run some kind of
proxy/man-in-the-middle server you can program. I would avoid updating the
table on write though - just keep a cache of sorts somewhere and
periodically compare the cache to the live table.

David J.

#5Ron
ronljohnsonjr@gmail.com
In reply to: Matthias Leisi (#1)
Re: Record last SELECT on a row?

On Wed, Dec 17, 2025 at 2:41 AM Matthias Leisi <matthias@leisi.net> wrote:

An application (which we can’t change) is accessing some Postgres table,
and we would like to record when the rows in that table were last read
(meaning: appeared in a SELECT result). The ultimate goal would be that we
can „age out“ rows which have not been accessed in a certain period of time.

The table contains some ten thousand rows, five columns, and we already
record created / last updated using triggers. Almost all accesses will
result in zero, one or very few records returned. Given the modest size of
the table, performance considerations are not top priority.

If we had full control over the application, we could eg use a function to
select the records and then update some „last read“ column. But since we
don’t control the application, that’s not an option. On the other hand, we
have full control over the database, so we could put some other „object“ in
lieu of the direct table.

Any other ways this could be achieved?

pgaudit might satisfy your needs, since it would only log SELECT statements
on that one table. You'd still have to grep the log file, so the
information wouldn't be real-time, but that's *probably* not important.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#6Matthias Leisi
matthias@leisi.net
In reply to: Ron (#5)
Re: Record last SELECT on a row?

pgaudit might satisfy your needs, since it would only log SELECT statements on that one table. You'd still have to grep the log file, so the information wouldn't be real-time, but that's probably not important.

That’s a viable suggestion, thanks a lot. Real-time is indeed not necessary, a daily (or even a weekly) cleaning of unused data is sufficient. pgaudit was anyway on the table for some other use cases, so that would fit in nicely.

Thanks,
— Matthias

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Matthias Leisi (#1)
Re: Record last SELECT on a row?

On 12/16/25 23:40, Matthias Leisi wrote:

An application (which we can’t change) is accessing some Postgres table, and we would like to record when the rows in that table were last read (meaning: appeared in a SELECT result). The ultimate goal would be that we can „age out“ rows which have not been accessed in a certain period of time.

Why?

Given the small size of the table, what is the gain expected?

Also is it assured that the reading of a row equals importance of a row?

I would expect any solution would impose more overhead then simply
leaving the rows alone.

The table contains some ten thousand rows, five columns, and we already record created / last updated using triggers. Almost all accesses will result in zero, one or very few records returned. Given the modest size of the table, performance considerations are not top priority.

If we had full control over the application, we could eg use a function to select the records and then update some „last read“ column. But since we don’t control the application, that’s not an option. On the other hand, we have full control over the database, so we could put some other „object“ in lieu of the direct table.

Any other ways this could be achieved?

Thanks,
Matthias

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Joe Conway
mail@joeconway.com
In reply to: Matthias Leisi (#6)
Re: Record last SELECT on a row?

On 12/17/25 11:25, Matthias Leisi wrote:

pgaudit might satisfy your needs, since it would only log SELECT
statements on that one table.  You'd still have to grep the log file,
so the information wouldn't be real-time, but that's /probably/ not
important.

That’s a viable suggestion, thanks a lot. Real-time is indeed not
necessary, a daily (or even a weekly) cleaning of unused data is
sufficient. pgaudit was anyway on the table for some other use cases, so
that would fit in nicely.

Possibly try using/abusing RLS?

8<-----------------
psql test
psql (19devel)
Type "help" for help.

create table t1(c1 int, c2 text);
insert into t1 values(1,'a'),(2,'b'),(3,'c'),(42,'zp');
grant select on table t1 to public;

create table a1(c1 int, t1 timestamptz);
create or replace function audit(int)
returns bool as
$$
insert into a1 values($1, now()) returning true
$$ security definer language sql;
create policy audit_t1 ON t1 for select using (audit(c1));
alter table t1 enable row level security;

create user joe;
set session authorization joe;
select * from t1 where c1=42;

c1 | c2
----+----
42 | zp
(1 row)

reset session authorization;
select * from a1;

c1 | t1
----+-------------------------------
42 | 2025-12-17 11:42:51.871843-05
(1 row)
8<-----------------

HTH,

--
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com

#9Greg Sabino Mullane
greg@turnstep.com
In reply to: Ron (#5)
Re: Record last SELECT on a row?

On Wed, Dec 17, 2025 at 10:24 AM Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

pgaudit might satisfy your needs, since it would only log SELECT
statements on that one table. You'd still have to grep the log file, so
the information wouldn't be real-time, but that's *probably* not
important.

That would only give you table-level information, unless it was a
dirt-simple query with no joins and an easy to parse where clause.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#10Thiemo Kellner
thiemo@gelassene-pferde.biz
In reply to: Matthias Leisi (#1)

Hi

I wonder whether a view calling a set returning function would be a viable option. Within the function you execute the actual query twice. Once to merge selected pk into a protocol table and once for the return of the data set.

Cheers

Thiemo

#11Greg Sabino Mullane
greg@turnstep.com
In reply to: Thiemo Kellner (#10)
Re: Record last SELECT on a row?

On Wed, Dec 17, 2025 at 12:14 PM Thiemo Kellner <thiemo@gelassene-pferde.biz>
wrote:

I wonder whether a view calling a set returning function would be a viable
option. Within the function you execute the actual query twice. Once to
merge selected pk into a protocol table and once for the return of the data
set.

The view would be sucking in the entire table every time . Maybe not a big
deal as this is a very tiny table.

I think Joe wins this thread with the RLS idea however.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#8)
Re: Record last SELECT on a row?

Joe Conway <mail@joeconway.com> writes:

Possibly try using/abusing RLS?

Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read. In your toy
example it'd work, because the generated plan is

regression=> explain verbose select * from t1 where c1=42;
QUERY PLAN
------------------------------------------------------------
Seq Scan on public.t1 (cost=0.00..343.38 rows=2 width=36)
Output: c1, c2
Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)

so the WHERE clause is applied before the RLS filter. But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.

If the application's behavior is simple and well-defined, this might
be good enough, of course.

I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:

1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view). SELECT
FOR SHARE might be good enough, not sure.

2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.

This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.

regards, tom lane

#13Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#12)
Re: Record last SELECT on a row?

On 12/17/25 13:37, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

Possibly try using/abusing RLS?

Cute idea, but I think it doesn't reliably address the problem of
wanting to identify the specific rows that were read. In your toy
example it'd work, because the generated plan is

regression=> explain verbose select * from t1 where c1=42;
QUERY PLAN
------------------------------------------------------------
Seq Scan on public.t1 (cost=0.00..343.38 rows=2 width=36)
Output: c1, c2
Filter: ((t1.c1 = 42) AND audit(t1.c1))
(3 rows)

so the WHERE clause is applied before the RLS filter. But in any
slightly-more-complicated situation, like a non-leakproof WHERE
condition, the order would be reversed so the log would overstate
which rows were read.

Sure, of course we have had requests for a leakproofness check bypass[1]/messages/by-id/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX=90vg@mail.gmail.com
for some use cases, and this could be one more such case.

If the application's behavior is simple and well-defined, this might
be good enough, of course.

FWIW when I read the original email in the thread I got the impression
that the application behavior was pretty simple WRT this table. But of
course I could easily be wrong...

I thought of a way that could possibly do this reliably, but it's
vastly more work than the use-case seems worth:

1. Convert the SELECTs into SELECT FOR UPDATE (you could do this
without changing the application, by interposing a view). SELECT
FOR SHARE might be good enough, not sure.

2. Write a logical replication output plugin that parses the WAL log
well enough to identify the tuple locks taken by FOR UPDATE.

Yeah this seems like a pretty heavy lift.

This should work to log only the rows actually read, because FOR
UPDATE is postponed to the top of the query plan, unlike RLS.

[1]: /messages/by-id/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX=90vg@mail.gmail.com
/messages/by-id/CAMxA3rsGQh9waorObOZyqrFqZ5uQ0b5D7SL6X6nh2kLhX=90vg@mail.gmail.com

--
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com

#14Matthias Leisi
matthias@leisi.net
In reply to: Joe Conway (#13)
Re: Record last SELECT on a row?

If the application's behavior is simple and well-defined, this might
be good enough, of course.

FWIW when I read the original email in the thread I got the impression that the application behavior was pretty simple WRT this table. But of course I could easily be wrong...

You are not wrong. The use case is in fact the `userpref` table used by Spamassassin. Left unmaintained, and given a large-enough user base, this has the tendency to grow considerably over time, so we want to gradually remove entries not actually used any more. (And we don’t want to patch Spamassassin core code to do this by itself…)

I’ll give it a try with ab/using RLS, which we use for some other use cases anyway already.

Thanks a lot for the suggestions in the thread, much appreciated!

— Matthias

#15Kris Deugau
kdeugau@vianet.ca
In reply to: Matthias Leisi (#14)
Re: Record last SELECT on a row?

Matthias Leisi wrote:

If the application's behavior is simple and well-defined, this might
be good enough, of course.

FWIW when I read the original email in the thread I got the impression
that the application behavior was pretty simple WRT this table. But of
course I could easily be wrong...

You are not wrong. The use case is in fact the `userpref` table used by
Spamassassin. Left unmaintained, and given a large-enough user base,
this has the tendency to grow considerably over time, so we want to
gradually remove entries not actually used any more. (And we don’t want
to patch Spamassassin core code to do this by itself…)

For this particular use case, would it be easier to periodically compare
the list of usernames with userpref data against active accounts on the
mail system?

-kgd