Trigger problems/questions

Started by Jim Fultonalmost 9 years ago5 messagesgeneral
Jump to latest
#1Jim Fulton
jim@jimfulton.info

I have an object database that's mirrored to a table with data in a JSONB
column. Data are organized into "communities". Community ids aren't
stored directly in content but can be found by recursively following
__parent__ properties. I want to be able to index content records on their
community ids.

(I originally tried to index functions that got ids, but apparently lying
about immutability is a bad idea and I suffered the consequences. :-])

I tried creating a trigger to populate a community_zoid property with a
community id when a record is inserted or updated. The trigger calls a
recursive functions to get the community id.

Trigger: https://github.com/karlproject/karl/blob/master/
karl/scripts/pgevolve.py#L261

Trigger procedure: https://github.com/karlproject/karl/blob/master/
karl/scripts/pgevolve.py#L236

Function to find a community id: https://github.com/
karlproject/karl/blob/master/karl/scripts/pgevolve.py#L209

This scheme succeeds most of the time, but occasionally, it fails.

I can find records where it has failed with a query like:

select zoid
from newt
where find_community_zoid(zoid, class_name, state) is not null
and not state ? 'community_zoid';

If I update the records where it has failed:

update newt set class_name=class_name
where find_community_zoid(zoid, class_name, state) is not null
and not state ? 'community_zoid';

Then retry the query above, I get 0 rows back. This would seem to indicate
that the trigger is logically correct.

Questions:

- Am I doing it wrong? For example, is there some limitation on trigger
procedures that I'm violating?
- If a trigger procedure errors, is the transaction aborted? (I would
hope so.)
- Should I expect triggers to be reliable and rely on them for database
integrity?
- Any suggestions on how to debug this?

Jim

--
Jim Fulton
http://jimfulton.info

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jim Fulton (#1)
Re: Trigger problems/questions

On 06/15/2017 10:23 AM, Jim Fulton wrote:

I have an object database that's mirrored to a table with data in a
JSONB column. Data are organized into "communities". Community ids
aren't stored directly in content but can be found by recursively
following __parent__ properties. I want to be able to index content
records on their community ids.

(I originally tried to index functions that got ids, but apparently
lying about immutability is a bad idea and I suffered the consequences. :-])

I tried creating a trigger to populate a community_zoid property with a
community id when a record is inserted or updated. The trigger calls a
recursive functions to get the community id.

Trigger:
https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L261
<https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L261&gt;

Trigger procedure:
https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L236
<https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L236&gt;

Function to find a community id:
https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L209
<https://github.com/karlproject/karl/blob/master/karl/scripts/pgevolve.py#L209&gt;

This scheme succeeds most of the time, but occasionally, it fails.

I can find records where it has failed with a query like:

select zoid
from newt
where find_community_zoid(zoid, class_name, state) is not null
and not state ? 'community_zoid';

If I update the records where it has failed:

update newt set class_name=class_name
where find_community_zoid(zoid, class_name, state) is not null
and not state ? 'community_zoid';

Then retry the query above, I get 0 rows back. This would seem to
indicate that the trigger is logically correct.

Questions:

* Am I doing it wrong? For example, is there some limitation on
trigger procedures that I'm violating?
* If a trigger procedure errors, is the transaction aborted? (I would
hope so.)
* Should I expect triggers to be reliable and rely on them for
database integrity?
* Any suggestions on how to debug this?

I do not pretend to fully understand what the the triggers/functions are
really doing, but I did notice this:

create or replace function populate_community_zoid_triggerf()

...

new_zoid := NEW.state ->> 'community_zoid';
zoid := find_community_zoid(
NEW.zoid, NEW.class_name, NEW.state)::text;
if zoid is null then
if new_zoid is not null then
NEW.state := NEW.state - 'community_zoid';
end if;
else
if new_zoid is null or zoid != new_zoid then
NEW.state :=
NEW.state || ('{"community_zoid": ' || zoid || '}')::jsonb;
end if;
end if;

...

Not sure what happens if zoid is null and new_zoid is null also?
Or if it matters?

Jim

--
Jim Fulton
http://jimfulton.info

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Fulton (#1)
Re: Trigger problems/questions

Jim Fulton <jim@jimfulton.info> writes:

I have an object database that's mirrored to a table with data in a JSONB
column. Data are organized into "communities". Community ids aren't
stored directly in content but can be found by recursively following
__parent__ properties. I want to be able to index content records on their
community ids.

(I originally tried to index functions that got ids, but apparently lying
about immutability is a bad idea and I suffered the consequences. :-])

I tried creating a trigger to populate a community_zoid property with a
community id when a record is inserted or updated. The trigger calls a
recursive functions to get the community id.
...
This scheme succeeds most of the time, but occasionally, it fails.

Since your original idea failed, I suppose that the parent relationships
are changeable? What mechanism have you got in place to propagate a
relationship change back down to the child records?

Also, this looks to have a race condition: if you search for a record's
community id at about the same time that someone else is changing the
parent linkage, you may get the old answer, but by the time you commit the
record update that answer may be obsolete. This is a problem because even
if you had another trigger that was trying (in the someone else's session)
to propagate new community ids back to affected records, it wouldn't think
that the record you're working on needs a change, because it would also
see the old version of that record.

Solutions to the race problem usually involve either SELECT FOR UPDATE
to lock rows involved in identifying the target record's community ID,
or use of SERIALIZABLE to cause the whole transaction to fail if its
results might be inconsistent. Either one will add some complexity
to your application code.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Jim Fulton
jim@jimfulton.info
In reply to: Adrian Klaver (#2)
Re: Trigger problems/questions

On Thu, Jun 15, 2017 at 1:44 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 06/15/2017 10:23 AM, Jim Fulton wrote:

...

I do not pretend to fully understand what the the triggers/functions are
really doing,

but I did notice this:

create or replace function populate_community_zoid_triggerf()

...

new_zoid := NEW.state ->> 'community_zoid';
zoid := find_community_zoid(
NEW.zoid, NEW.class_name, NEW.state)::text;
if zoid is null then
if new_zoid is not null then
NEW.state := NEW.state - 'community_zoid';
end if;
else
if new_zoid is null or zoid != new_zoid then
NEW.state :=
NEW.state || ('{"community_zoid": ' || zoid || '}')::jsonb;
end if;
end if;

...

Not sure what happens if zoid is null and new_zoid is null also?

Then then no change is needed and nothing happens, which is correct.

Or if it matters?

Nope.

Thanks for looking.

Jim

--
Jim Fulton
http://jimfulton.info

#5Jim Fulton
jim@jimfulton.info
In reply to: Tom Lane (#3)
Re: Trigger problems/questions

On Thu, Jun 15, 2017 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Fulton <jim@jimfulton.info> writes:

I have an object database that's mirrored to a table with data in a JSONB
column. Data are organized into "communities". Community ids aren't
stored directly in content but can be found by recursively following
__parent__ properties. I want to be able to index content records on

their

community ids.

(I originally tried to index functions that got ids, but apparently lying
about immutability is a bad idea and I suffered the consequences. :-])

I tried creating a trigger to populate a community_zoid property with a
community id when a record is inserted or updated. The trigger calls a
recursive functions to get the community id.
...
This scheme succeeds most of the time, but occasionally, it fails.

Since your original idea failed, I suppose that the parent relationships
are changeable?

Good question.

A few kinds of objects can, rarely, move in the hierarchy, and, they never
move between communities, so their community id never changes.

IDK WTF my indexing attempt. I could build the index, then add an object
to the tree and it wouldn't be indexed. This was in a staging database
where there were no other changes.

What mechanism have you got in place to propagate a
relationship change back down to the child records?

This is a non-issue, at least WRT community ids. If I were, for example,
to index paths, it would be an issue for some objects, but I'm not at that
point yet.

Also, this looks to have a race condition: if you search for a record's
community id at about the same time that someone else is changing the
parent linkage, you may get the old answer, but by the time you commit the
record update that answer may be obsolete. This is a problem because even
if you had another trigger that was trying (in the someone else's session)
to propagate new community ids back to affected records, it wouldn't think
that the record you're working on needs a change, because it would also
see the old version of that record.

Solutions to the race problem usually involve either SELECT FOR UPDATE
to lock rows involved in identifying the target record's community ID,
or use of SERIALIZABLE to cause the whole transaction to fail if its
results might be inconsistent. Either one will add some complexity
to your application code.

There's a global lock around all of the updates to the table. (This isn't
as unreasonable as it sounds :), but anyway, that's outside the scope of
this discussion.)

Even if there was some kind of race, I'd still get a community id set, it
might be wrong, but it would be set.

regards, tom lane

Thanks.

Jim

--
Jim Fulton
http://jimfulton.info