How to optimize PostgreSQL Row Security Policies that involve related tables?

Started by David Ventimigliaover 2 years ago2 messagesgeneral
Jump to latest
#1David Ventimiglia
davidaventimiglia@hasura.io

Hello!

What are some good ways to create Row Security Policies that restrict rows
in one table using a filter on a related table, and get good performance?
It's difficult to describe but let me try to explain.

I'm using the venerable old Chinook database
<https://github.com/cwoodruff/ChinookDatabase&gt;. Among its tables are
Artist, Album, and Track. Track has a foreign key constraint on
Track.AlbumId, and Album has a foreign key constraint on Album.ArtistId.
Albums belong to Artists while Tracks belong to Albums. I want to have two
policies:

1. Restrict access to Album based on Album.ArtistId =
current_setting('rls.artistId'), supposing that 'rls.artistId' will be set
to the current artist's valid artistId.
2. Restrict access to Track such that the current artist only sees
tracks on their own albums and not on anybody else's albums.

One challenge is that the Track table doesn't actually have ArtistId
(naturally). However, if we know there's a valid policy on Album, then we
should be able to leverage that in a policy on Track. The trouble is, I
can't do a join in the Track policy (or at least, I haven't figured out how
to do a join), so I have to resort to a "exists" check. Specifically, the
two policies look like this:

create policy artist_rls_policy ON "Album" for select to public using
("ArtistId"=(current_setting('rls.artistID'))::integer);

create policy album_rls_policy on "Track" for select to public
using (
exists (select * from "Album" where "Album"."AlbumId" = "AlbumId") --
THIS IS THE TROUBLESOME POLICY CHECK HERE
);

But, the presence of the "exists" check leads to a suboptimal plan:

explain analyze select * from "Track";

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "Track" (cost=8.17..203181.49 rows=5001816 width=56) (actual
time=4.631..699.831 rows=10003504 loops=1)
Filter: $0
InitPlan 1 (returns $0)
-> Index Scan using "IFK_AlbumArtistId" on "Album" (cost=0.15..8.17
rows=1 width=0) (actual time=0.022..0.022 rows=1 loops=1)
Index Cond: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Filter: ("AlbumId" IS NOT NULL)
Planning Time: 0.209 ms
JIT:
Functions: 7
Options: Inlining false, Optimization false, Expressions true, Deforming
true
Timing: Generation 0.673 ms, Inlining 0.000 ms, Optimization 0.391 ms,
Emission 3.793 ms, Total 4.857 ms
Execution Time: 876.035 ms
(12 rows)

In my query, sure, I can explicitly join Track to Album to get a better
plan:

explain analyze select * from "Track" natural join "Album";

QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=8.75..37450.90 rows=14414 width=83) (actual
time=0.184..0.314 rows=14 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using "IFK_AlbumArtistId" on "Album" "Album_1"
(cost=0.15..8.17 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=1)
Index Cond: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Filter: ("AlbumId" IS NOT NULL)
-> Index Scan using "PK_Album" on "Album" (cost=0.15..22.82 rows=1
width=31) (actual time=0.136..0.252 rows=1 loops=1)
Filter: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Rows Removed by Filter: 346
-> Index Scan using "IFK_TrackAlbumId" on "Track" (cost=0.43..32418.09
rows=500182 width=56) (actual time=0.044..0.051 rows=14 loops=1)
Index Cond: ("AlbumId" = "Album"."AlbumId")
Filter: $0
Planning Time: 0.509 ms
Execution Time: 0.364 ms
(13 rows)

But, that's redundant since there's enough information to process the query
correctly (albeit more slowly) without the join. The question is, how if
at all can I create the policies to have both the desired composability and
also good performance? I hope I've explained this well enough. I asked
this question a while back on StackOverflow
<https://stackoverflow.com/questions/76525562/how-to-optimize-postgresql-row-security-policies-that-involve-related-tables&gt;
but
got no interest. I also have sample code in a public GitHub repository here
<https://github.com/dventimihasura/hasura-projects/tree/master/rls-optimization-1&gt;
that
illustrates the setup. Any advice would be greatly appreciated. Thank you!

Kind regards,
David A. Ventimiglia

#2David Ventimiglia
davidaventimiglia@hasura.io
In reply to: David Ventimiglia (#1)
Re: How to optimize PostgreSQL Row Security Policies that involve related tables?

Sorry, I had a typo in my Row Policy. Rather, it should look like this:

create policy album_rls_policy on "Track" for select to public
using (
exists (select * from "Album" where "Album"."AlbumId" =
"Track"."AlbumId")
);

On Mon, Aug 14, 2023 at 9:36 AM David Ventimiglia <
davidaventimiglia@hasura.io> wrote:

Show quoted text

Hello!

What are some good ways to create Row Security Policies that restrict rows
in one table using a filter on a related table, and get good performance?
It's difficult to describe but let me try to explain.

I'm using the venerable old Chinook database
<https://github.com/cwoodruff/ChinookDatabase&gt;. Among its tables are
Artist, Album, and Track. Track has a foreign key constraint on
Track.AlbumId, and Album has a foreign key constraint on Album.ArtistId.
Albums belong to Artists while Tracks belong to Albums. I want to have two
policies:

1. Restrict access to Album based on Album.ArtistId =
current_setting('rls.artistId'), supposing that 'rls.artistId' will be set
to the current artist's valid artistId.
2. Restrict access to Track such that the current artist only sees
tracks on their own albums and not on anybody else's albums.

One challenge is that the Track table doesn't actually have ArtistId
(naturally). However, if we know there's a valid policy on Album, then we
should be able to leverage that in a policy on Track. The trouble is, I
can't do a join in the Track policy (or at least, I haven't figured out how
to do a join), so I have to resort to a "exists" check. Specifically, the
two policies look like this:

create policy artist_rls_policy ON "Album" for select to public using
("ArtistId"=(current_setting('rls.artistID'))::integer);

create policy album_rls_policy on "Track" for select to public
using (
exists (select * from "Album" where "Album"."AlbumId" = "AlbumId") --
THIS IS THE TROUBLESOME POLICY CHECK HERE
);

But, the presence of the "exists" check leads to a suboptimal plan:

explain analyze select * from "Track";

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "Track" (cost=8.17..203181.49 rows=5001816 width=56) (actual
time=4.631..699.831 rows=10003504 loops=1)
Filter: $0
InitPlan 1 (returns $0)
-> Index Scan using "IFK_AlbumArtistId" on "Album" (cost=0.15..8.17
rows=1 width=0) (actual time=0.022..0.022 rows=1 loops=1)
Index Cond: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Filter: ("AlbumId" IS NOT NULL)
Planning Time: 0.209 ms
JIT:
Functions: 7
Options: Inlining false, Optimization false, Expressions true,
Deforming true
Timing: Generation 0.673 ms, Inlining 0.000 ms, Optimization 0.391 ms,
Emission 3.793 ms, Total 4.857 ms
Execution Time: 876.035 ms
(12 rows)

In my query, sure, I can explicitly join Track to Album to get a better
plan:

explain analyze select * from "Track" natural join "Album";

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=8.75..37450.90 rows=14414 width=83) (actual
time=0.184..0.314 rows=14 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using "IFK_AlbumArtistId" on "Album" "Album_1"
(cost=0.15..8.17 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=1)
Index Cond: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Filter: ("AlbumId" IS NOT NULL)
-> Index Scan using "PK_Album" on "Album" (cost=0.15..22.82 rows=1
width=31) (actual time=0.136..0.252 rows=1 loops=1)
Filter: ("ArtistId" =
(current_setting('rls.artistID'::text))::integer)
Rows Removed by Filter: 346
-> Index Scan using "IFK_TrackAlbumId" on "Track"
(cost=0.43..32418.09 rows=500182 width=56) (actual time=0.044..0.051
rows=14 loops=1)
Index Cond: ("AlbumId" = "Album"."AlbumId")
Filter: $0
Planning Time: 0.509 ms
Execution Time: 0.364 ms
(13 rows)

But, that's redundant since there's enough information to process the
query correctly (albeit more slowly) without the join. The question is,
how if at all can I create the policies to have both the desired
composability and also good performance? I hope I've explained this well
enough. I asked this question a while back on StackOverflow
<https://stackoverflow.com/questions/76525562/how-to-optimize-postgresql-row-security-policies-that-involve-related-tables&gt; but
got no interest. I also have sample code in a public GitHub repository
here
<https://github.com/dventimihasura/hasura-projects/tree/master/rls-optimization-1&gt; that
illustrates the setup. Any advice would be greatly appreciated. Thank you!

Kind regards,
David A. Ventimiglia