How to handle things that change over time?
I am working on a system that will support internal bossiness work for a
company. Periodicly things will change in their "world". For example they
periodically recalculate the "mark up" of various components, of their
bushiness, such as labor and or purchased materials. Presently I am keeping
these constants in a table, and I have the constant, and an effective start,
and end date for these factors. In this scenario, the number of rows will
just grow over time, and by using the proper conditions in our select
statement, i can apply the correct factors as they change over time.
In another scenario, a column of the employee table is the pay rate.
Obviously this will change over time, also. It does not appear to me that
this lends itself to this same treatment, as most of the data associated
with a given employee, is fairly static, and if I add an entire row, just
because the pay rate changed, this looks overly complex.
This cannot be the first time this issue has been addressed. What have
others done in this scenario?
I"m a big fan of a table / table_history combo. Meaning, if you have a
person and that person can have different states that change frequently,
you can do something like this:
create table people(
person_id
person_name
)
create table people_state_history(
person_id references people,
effective tsrange not null
default tsrange(clock_timestamp()::timestamp without time zone,
'infinity'),
primary key (person_id, effective),
person_state
)
Use a constraint so you can't have overlapping effective ranges:
alter table person_state_history add
constraint no_overlapping_person_state_history
exclude using gist (
cast(person_id as text ) with =,
effective with &&);
Then, to see a person's state at the current time, you can do something
like this:
select person_id, person_state from people_state_history where effective <@
now()::timestamp without time zone
(and add your joins in as well).
We've also created a trigger, so any new insert on the state_history table
will automatically close the last row.
create or replace function set_event_history_in_use ()
returns trigger
as
$$
begin
update person_state_history
set effective = tsrange(lower(effective),
clock_timestamp()::timestamp without time zone)
where clock_timestamp()::timestamp without time zone <@ effective
and person_id = NEW.person_id;
NEW.effective := tsrange(clock_timestamp()::timestamp without time
zone, 'infinity');
return NEW;
end;
$$
language plpgsql;
Then, depending on business requirements, you can create this dual table to
track history across time. If salaries are something people would like to
see over time, then you could do a person_salary_history table, etc.
Hope this helps!
Best,
Rob Heinen
On Fri, Sep 13, 2019 at 11:57 AM stan <stanb@panix.com> wrote:
Show quoted text
I am working on a system that will support internal bossiness work for a
company. Periodicly things will change in their "world". For example they
periodically recalculate the "mark up" of various components, of their
bushiness, such as labor and or purchased materials. Presently I am keeping
these constants in a table, and I have the constant, and an effective
start,
and end date for these factors. In this scenario, the number of rows will
just grow over time, and by using the proper conditions in our select
statement, i can apply the correct factors as they change over time.In another scenario, a column of the employee table is the pay rate.
Obviously this will change over time, also. It does not appear to me that
this lends itself to this same treatment, as most of the data associated
with a given employee, is fairly static, and if I add an entire row, just
because the pay rate changed, this looks overly complex.This cannot be the first time this issue has been addressed. What have
others done in this scenario?
On 9/13/19 2:57 AM, stan wrote:
I am working on a system that will support internal bossiness work for a
company. Periodicly things will change in their "world".
Oh this is my favorite topic. :-) It's a common problem, although
solutions don't seem to be well-known in the programming community.
Typically you'd use start/end times on your table, as you suggested.
Postgres is a great RDBMS for this since it has range types and
exclusion constraints. The SQL:2011 standard also has temporal primary
keys, foreign keys, SELECTs, and UPDATE/DELETEs, and we're working on
adding those too. But your use case sounds easy to build even without
those extra features. Here is a talk I gave this summer about temporal
features in Postgres:
https://github.com/pjungwir/postgres-temporal-talk
And here is an annotated bibliography to the main writings & tools out
there, including some projects that might help you implement what you
want in Postgres:
https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/
In another scenario, a column of the employee table is the pay rate.
Obviously this will change over time, also. It does not appear to me
that this lends itself to this same treatment, as most of the data
associated with a given employee, is fairly static, and if I add an
entire row, just because the pay rate changed, this looks overly
complex.
Normally people would indeed just add another row. I wouldn't call it
complex (especially compared to the alternatives), but maybe a little
wasteful. The Date/Darwen/Lorentzos book gives a lot of attention to
avoiding the problem though. Essentially you'd adopt a sixth-normal form
where each attribute (column) gets its own table.
Good luck. I'm always happy to talk about temporal databases if you
like. :-)
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com
On 13/09/2019 17:19, Paul Jungwirth wrote:
The SQL:2011 standard also has temporal primary keys, foreign keys,
SELECTs, and UPDATE/DELETEs, and we're working on adding those too.
And here is a full implementation of all that:
https://github.com/xocolatl/periods/
It is packaged in both the deb and rpm repositories for PGDG.