Transition relations: correlating OLD TABLE and NEW TABLE
In a situation where we're using transition relations to capture changes
after UPDATE statements, i.e., using a trigger of the form
CREATE TRIGGER trigger_name AFTER UPDATE ON table_name
REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
FOR EACH STATEMENT EXECUTE PROCEDURE procedure();
there seems to be a challenge in how to join `old_table` and `new_table` so
that the old and new version of each row can be matched up with each other.
Of course if the table has a primary key, then this can be used, but I'm
wondering how to handle this in the general case where a primary key might
not exist.
According to this blog (
http://databasedoings.blogspot.com/2017/07/cool-stuff-in-postgresql-10-transition.html)
it is possible to use ROW_NUMBER() OVER () to create a key to join the old
and new tables, but this depends on an implementation detail (or at least,
isn't documented?) that the rows will be returned in the same order for the
two tables. Is it correct that this will work under the existing
implementation?
If there's not a clean way of matching up the old and new versions, could
the transition relation mechanism be extended in order to make this
possible? Here's a couple ideas:
1) A special system column could be added to the two transition relations,
OLD TABLE, and NEW TABLE, providing a common value that could be used to
join corresponding rows; it could be a sequential value (like what
ROW_NUMBER() would generate), or it could be some other unique identifier
for the row that is convenient for implementation. But there's some
awkwardness in the fact that this special column name could clash with the
columns in the table (unless an existing reserved name is used). Also,
exposing a unique row identifier might restrict potential future
implementations.
2) Maybe a cleaner way would be to add a third kind of transition table,
say, a "CHANGE TABLE", which could be used as an alternative to "OLD TABLE"
and "NEW TABLE". A change table could contain just two columns, say
'old_row' and 'new_row', each of which have the appropriate record type. In
this way, the old table and new table are essentially "pre-joined" in the
transition table.
Would this be workable? Or is there some other way of achieving this?
- Brent Kerby
On Friday, July 6, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Of course if the table has a primary key, then this can be used, but I'm
wondering how to handle this in the general case where a primary key might
not exist.
Personally, I would consider the lack of a PK a rare and special case...I'd
handle the proposed situation by adding a big serial column to the table.
David J.
This is a possible workaround. But even if a table has a primary key, it
seems like there's some inefficiency in doing things this way: the old and
new row versions start out linked together (for instance this information
is available in a FOR EACH ROW trigger), but we're throwing away that
information by splitting them into two separate relations, forcing us to
have to join them back up again. Wouldn't it make sense to expose a
transition relation where the correspondence between old and new versions
is never discarded in the first place? Or is there something I'm missing?
Also, there are cases where it may not be desired to have a primary key, as
the index maintenance and constraint checking are not free and not always
necessary. And if one wants to implement a general change data capture
setup, it would nice to be able accomodate such tables without having to
alter them.
I'd be happy to try to work out an implementation of REFERENCING CHANGE
TABLE if there's support for the idea. Or is there some problem with this,
or some better way of achieving the goal?
- Brent
On Fri, Jul 6, 2018 at 9:36 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:
Show quoted text
On Friday, July 6, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Of course if the table has a primary key, then this can be used, but I'm
wondering how to handle this in the general case where a primary key might
not exist.Personally, I would consider the lack of a PK a rare and special
case...I'd handle the proposed situation by adding a big serial column to
the table.David J.
On Saturday, July 7, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Also, there are cases where it may not be desired to have a primary key,
as the index maintenance and constraint checking are not free and not
always necessary.
Btree uniqueness enforcement is worth the price.
I'd be happy to try to work out an implementation of REFERENCING CHANGE
TABLE if there's support for the idea. Or is there some problem with this,
or some better way of achieving the goal?
I do see the value in basically saying, "I have a unique index but I want
to write a generic function that can handle being installed on any table
and, without dynamic sql, can be presented with a full outer join relation
of all inserts, updates, and deletes." Natural join won't work. We can
limit the feature to just transition tables or create a new join type that
would require left and right to be the same relation and the syste, would
figure out and join on the PK columns (or planner error if there are none).
Keep in mind that part of this discussion involves deciding where we are OK
with being non-standard.
David J.