BUG #18782: Inconsistent behaviour with triggers and row level security - depends on prior number of inserts

Started by PG Bug reporting formabout 1 year ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18782
Logged by: Julian Wreford
Email address: julian.wreford@gearset.com
PostgreSQL version: 17.2
Operating system: Windows 11 Pro, 24H2, 26100.2605
Description:

We have been seeing behaviour of the interaction between triggers and row
level security being inconsistent depending on the number of inserts that
have previously been made to the table which the trigger is attached to. I
will attach full a code reproduction at the end of the issue.

Our expectation is that a trigger which calls RLS which throws an exception
should always throw an exception. But our observed behaviour is that this is
only the case straight after the table has been made, and after some number
of inserts (at least 6) the trigger stops throwing an exception even when
the RLS policy should throw an exception

I create two tables (`no_rls_table` and `rls_table`). These tables both have
a column called `should_not_duplicate` where the aim is that we block
inserts into one table if the other table has the same value already
existing.

`no_rls_table` has not got any RLS on it and the app user has full
permissions on it
`rls_table` has RLS on it which checks if the `team_id` (uuid type) column
matches a setting called `team.team_id` (which we check using
`current_setting`). When `team.team_id` is missing we expect the RLS to
throw an exception (because Postgres reads it as an empty string and an
empty string can't be converted to a UUID).

We then add a trigger to `no_rls_table` which checks for every row inserted
if the value already exists in `rls_table`. If it does then we raise an
exception, otherwise continue with the insert.

(1) As expected, we can correctly INSERT into `no_rls_table` if we use `SET
LOCAL team.team_id = '6a43cea8-4a5c-4989-bae2-ef5a77d92620'`
(2) And again, as expected the INSERT into `no_rls_table` will fail with an
exception if we don't set `team.team_id` to a value

(3) HOWEVER, if we then do a bunch (6+) of inserts into the `no_rls_table`
we observe different behaviours for the above.
(4) Specifically, we now try and do an INSERT without setting `team.team_id`
and we expect the behaviour to be a thrown exception as was the case above,
but now the INSERT completes with no exceptions.

It was our understanding that the RLS function is the first thing that is
called before accessing any table, so we would expect the trigger to ALWAYS
fail if there is no local variable set, but this is not the case if it
follows a chunk of inserts into the relevant table.

I would be very grateful if anyone was able to help provide some light on
this situation! We have tested this on v14.5, 16.6 and 17.2 and found
consistent behaviour

Full code to reproduce the scenario
```sql
-- Create less privileged user
CREATE USER app_user;

-- Create table which has no Row level security and give necessary
permissions to app user
CREATE TABLE no_rls_table
(
id BIGSERIAL PRIMARY KEY,
should_not_duplicate uuid
);
GRANT SELECT, INSERT, DELETE ON TABLE no_rls_table to app_user;
GRANT USAGE, SELECT ON SEQUENCE no_rls_table_id_seq to app_user;

-- Create table which has row level security based on local settings
(team.team_id)
CREATE TABLE rls_table
(
id BIGSERIAL PRIMARY KEY,
team_id uuid,
should_not_duplicate uuid
);
GRANT SELECT, INSERT, DELETE ON TABLE rls_table to app_user;
GRANT USAGE, SELECT ON SEQUENCE rls_table_id_seq to app_user;

CREATE POLICY rls_table_policy
ON rls_table
TO app_user
USING (
team_id = current_setting('team.team_id') :: uuid
)
WITH CHECK (
team_id = current_setting('team.team_id') :: uuid
);

ALTER TABLE rls_table ENABLE ROW LEVEL SECURITY;

-- Add a trigger on the no_rls_table which checks if the rls_table has a
particular value
CREATE OR REPLACE FUNCTION ensure_not_duplicated_on_rls_table() RETURNS
TRIGGER AS
$$
BEGIN
IF EXISTS (
SELECT 1
FROM rls_table
WHERE rls_table.should_not_duplicate =
NEW.should_not_duplicate
)
THEN
RAISE EXCEPTION 'This value should not be duplicated between the
two tables';
END IF;

RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER check_not_duplicated_on_rls_table
BEFORE INSERT OR UPDATE ON no_rls_table
FOR EACH ROW
EXECUTE PROCEDURE ensure_not_duplicated_on_rls_table();

-- (1) One INSERT to demonstrate it works correctly with a local team
BEGIN;
SET ROLE app_user;
SET LOCAL team.team_id = '6a43cea8-4a5c-4989-bae2-ef5a77d92620';
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
COMMIT;

-- (2) Try and insert into no_rls_table and it fails if it has no
team.team_id
BEGIN;
SET ROLE app_user;
-- This command will fail with an exception `[22P02] ERROR: invalid input
syntax for type uuid: ""`
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
COMMIT;

-- (3) Do a bunch of inserts into no_rls_table with team.team_id so these
work correctly
BEGIN;
SET ROLE app_user;
SET LOCAL team.team_id = '6a43cea8-4a5c-4989-bae2-ef5a77d92620';
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
COMMIT;

-- (4) Try and insert the no_rls_table without team.team_id again and this
time no exception is thrown. THIS IS THE UNEXPECTED BEHAVIOUR
BEGIN;
SET ROLE app_user;
INSERT INTO no_rls_table (should_not_duplicate) VALUES (
gen_random_uuid());
COMMIT;

-- Cleanup Phase
RESET ROLE;
DROP TABLE rls_table;
DROP TABLE no_rls_table;
```

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18782: Inconsistent behaviour with triggers and row level security - depends on prior number of inserts

PG Bug reporting form <noreply@postgresql.org> writes:

We have been seeing behaviour of the interaction between triggers and row
level security being inconsistent depending on the number of inserts that
have previously been made to the table which the trigger is attached to. I
will attach full a code reproduction at the end of the issue.

Thanks for the carefully-documented bug report! Often we have to
ask a lot of questions before we can get to the bottom of a problem.

Our expectation is that a trigger which calls RLS which throws an exception
should always throw an exception.

I think this expectation is faulty, and there's not really any bug
here. Poking at your example, I get

regression=# set role app_user;
SET
regression=> explain verbose
SELECT 1
FROM rls_table
WHERE rls_table.should_not_duplicate = gen_random_uuid();
ERROR: invalid input syntax for type uuid: ""
regression=> set team.team_id = '6a43cea8-4a5c-4989-bae2-ef5a77d92620';
SET
regression=> explain verbose
SELECT 1
FROM rls_table
WHERE rls_table.should_not_duplicate = gen_random_uuid();
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.rls_table (cost=0.00..40.00 rows=1 width=4)
Output: 1
Filter: ((rls_table.should_not_duplicate = gen_random_uuid()) AND (rls_table.team_id = (current_setting('team.team_id'::text))::uuid))
(3 rows)

Now first off, notice that I couldn't even do an EXPLAIN as app_user
until I set team.team_id. That indicates that the error is being
thrown during planning. The planner is trying to estimate the
selectivity of the "rls_table.team_id = something" clause, and
since current_setting is not volatile it sees no reason not to
evaluate it to get a constant to check against the statistics for
team_id. Then when it tries to evaluate the cast to uuid, kaboom.

Secondly, notice that the RLS-derived team_id clause is placed
after the one about should_not_duplicate. This is perhaps
surprising, but it's legitimate because uuid_eq is marked
leakproof, meaning we trust it not to leak any information
about the table contents. What that means is that at runtime,
the cast to uuid will blow up only if there's at least one match
in rls_table.should_not_duplicate to the proposed new value.
In your test case, rls_table remains empty so there will never
be a run-time failure, whether team.team_id is valid or not.

So the error is not coming from where you think. It's coming from
planning, and the reason it stops appearing after a few successful
trigger executions is that the plancache switches over to a generic
plan and stops invoking the planner.

(I did verify that the test on rls_table.team_id is still present
in the generic plan, so there's not a bug of that sort.)

In general, I would be wary of depending on errors to be thrown in
specific contexts and not other contexts. That's a hard property to
guarantee in the face of any optimization, and we don't really try.
In the situation at hand, that means that RLS conditions that could
throw errors are a bad idea. This one, far from enforcing that the
app_user can't learn about what is in rls_table, actually facilitates
doing so :-(. I think it'd be safer to form the RLS tests as

team_id::text = current_setting('team.team_id', true)

(where "true" prevents failure if team.team_id is unset).

regards, tom lane

#3Julian Wreford
julian.wreford@gearset.com
In reply to: Tom Lane (#2)
Re: BUG #18782: Inconsistent behaviour with triggers and row level security - depends on prior number of inserts

<tgl(at)sss(dot)pgh(dot)pa(dot)us> writes

Our expectation is that a trigger which calls RLS which throws an

exception

should always throw an exception.

I think this expectation is faulty, and there's not really any bug

here.

In general, I would be wary of depending on errors to be thrown in

specific contexts and not other contexts.

Hi Tom, thank you very much for looking into this and that explanation
makes sense, I'll take a look at changing our RLS policies to use
`ok_missing = true` as that sounds like a sensible route forwards.

Many thanks,
Julian Wreford

On Sat, 25 Jan 2025 at 19:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

We have been seeing behaviour of the interaction between triggers and row
level security being inconsistent depending on the number of inserts that
have previously been made to the table which the trigger is attached to.

I

will attach full a code reproduction at the end of the issue.

Thanks for the carefully-documented bug report! Often we have to
ask a lot of questions before we can get to the bottom of a problem.

Our expectation is that a trigger which calls RLS which throws an

exception

should always throw an exception.

I think this expectation is faulty, and there's not really any bug
here. Poking at your example, I get

regression=# set role app_user;
SET
regression=> explain verbose
SELECT 1
FROM rls_table
WHERE rls_table.should_not_duplicate = gen_random_uuid();
ERROR: invalid input syntax for type uuid: ""
regression=> set team.team_id = '6a43cea8-4a5c-4989-bae2-ef5a77d92620';
SET
regression=> explain verbose
SELECT 1
FROM rls_table
WHERE rls_table.should_not_duplicate = gen_random_uuid();
QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.rls_table (cost=0.00..40.00 rows=1 width=4)
Output: 1
Filter: ((rls_table.should_not_duplicate = gen_random_uuid()) AND
(rls_table.team_id = (current_setting('team.team_id'::text))::uuid))
(3 rows)

Now first off, notice that I couldn't even do an EXPLAIN as app_user
until I set team.team_id. That indicates that the error is being
thrown during planning. The planner is trying to estimate the
selectivity of the "rls_table.team_id = something" clause, and
since current_setting is not volatile it sees no reason not to
evaluate it to get a constant to check against the statistics for
team_id. Then when it tries to evaluate the cast to uuid, kaboom.

Secondly, notice that the RLS-derived team_id clause is placed
after the one about should_not_duplicate. This is perhaps
surprising, but it's legitimate because uuid_eq is marked
leakproof, meaning we trust it not to leak any information
about the table contents. What that means is that at runtime,
the cast to uuid will blow up only if there's at least one match
in rls_table.should_not_duplicate to the proposed new value.
In your test case, rls_table remains empty so there will never
be a run-time failure, whether team.team_id is valid or not.

So the error is not coming from where you think. It's coming from
planning, and the reason it stops appearing after a few successful
trigger executions is that the plancache switches over to a generic
plan and stops invoking the planner.

(I did verify that the test on rls_table.team_id is still present
in the generic plan, so there's not a bug of that sort.)

In general, I would be wary of depending on errors to be thrown in
specific contexts and not other contexts. That's a hard property to
guarantee in the face of any optimization, and we don't really try.
In the situation at hand, that means that RLS conditions that could
throw errors are a bad idea. This one, far from enforcing that the
app_user can't learn about what is in rls_table, actually facilitates
doing so :-(. I think it'd be safer to form the RLS tests as

team_id::text = current_setting('team.team_id', true)

(where "true" prevents failure if team.team_id is unset).

regards, tom lane