preserving data after updates

Started by Scott Frankelabout 21 years ago9 messagesgeneral
Jump to latest
#1Scott Frankel
leknarf@pacbell.net

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost. If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
leaving the old row intact in the db for fishing expeditions,
posterity, &c.
-- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
building a full set of data through heavy lifting and multiple
queries
through 'n' number of old rows
-- overly complex query design probably leading to errors

3) create a new table that tracks changes
-- the table is either wide enough to mirror all columns in
the working table, or uses generic columns and API tricks to
parse token pair strings, ...

4) other?

Thanks
Scott

#2Ian Harding
iharding@tpchd.org
In reply to: Scott Frankel (#1)
Re: preserving data after updates

I do option 3. A generic trigger writes the tablename, field name, type
of change (insert/update/delete) and the old and new values for columns
that were affected. It is kind of a hog, but it works very well. I
have cron delete old entries so it doesn't eat my whole disk.

I haven't tried to get it to give up the data in the same representation
as the source table, but it shouldn't be too hard with a set returning
function, such that you tell it the table name and timestamp and it
returns records as they existed at that time. I usually just query it
directly to "see what happened".

Scott Frankel <leknarf@pacbell.net> 03/03/05 3:51 PM >>>

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost. If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
leaving the old row intact in the db for fishing expeditions,
posterity, &c.
-- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
building a full set of data through heavy lifting and multiple
queries
through 'n' number of old rows
-- overly complex query design probably leading to errors

3) create a new table that tracks changes
-- the table is either wide enough to mirror all columns in
the working table, or uses generic columns and API tricks to
parse token pair strings, ...

4) other?

Thanks
Scott

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#3codeWarrior
gpatnude@hotmail.com
In reply to: Scott Frankel (#1)
Re: preserving data after updates

I use a modified form of option 3 with an ON UPDATE RULE.... the update rule
copies the row to an inherited table...

CREATE TABLE dm_user (

id SERIAL NOT NULL PRIMARY KEY,

lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id),
dm_user_address INTEGER NOT NULL DEFAULT 0,
dm_user_email INTEGER NOT NULL DEFAULT 0,

f_name VARCHAR(50) NOT NULL,
m_name VARCHAR(50) NOT NULL,
l_name VARCHAR(50) NOT NULL,

uname VARCHAR(20) NOT NULL,
upwd VARCHAR(20) NOT NULL,
pwd_change_reqd BOOLEAN DEFAULT FALSE,
login_allowed BOOLEAN DEFAULT TRUE,
lost_passwd BOOLEAN DEFAULT FALSE,

create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
change_dt TIMESTAMP NOT NULL DEFAULT NOW(),
change_id INTEGER NOT NULL DEFAULT 0,
active_flag BOOLEAN NOT NULL DEFAULT TRUE

) WITH OIDS;

CREATE TABLE dm_user_history (

history_id SERIAL NOT NULL PRIMARY KEY,
hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW()

) INHERITS (dm_user);

CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO
dm_user_history SELECT * FROM dm_user WHERE id = old.id;

CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE
dm_user SET active_flag = FALSE WHERE id = old.id;

"Scott Frankel" <leknarf@pacbell.net> wrote in message
news:bd02bff5561d8b271301ba10bafca105@pacbell.net...

Show quoted text

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost. If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
leaving the old row intact in the db for fishing expeditions,
posterity, &c.
-- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
building a full set of data through heavy lifting and multiple
queries
through 'n' number of old rows
-- overly complex query design probably leading to errors

3) create a new table that tracks changes
-- the table is either wide enough to mirror all columns in
the working table, or uses generic columns and API tricks to
parse token pair strings, ...

4) other?

Thanks
Scott

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#4Berend Tober
btober@seaworthysys.com
In reply to: codeWarrior (#3)
Re: preserving data after updates

I use a modified form of option 3 with an ON UPDATE RULE.... the update rule
copies the row to an inherited table...

I just gotta say that THAT is one COOL use of PG inheritance! Do you find that
it works well and is robust and all the good stuff it seems like would be the
case?

-- Berend

#5Scott Frankel
leknarf@pacbell.net
In reply to: codeWarrior (#3)
Re: preserving data after updates

Door number 3.

Thanks for the responses and terrific suggestions!
Scott

On Mar 4, 2005, at 8:28 AM, Greg Patnude wrote:

Show quoted text

I use a modified form of option 3 with an ON UPDATE RULE.... the
update rule
copies the row to an inherited table...

CREATE TABLE dm_user (

id SERIAL NOT NULL PRIMARY KEY,

lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id),
dm_user_address INTEGER NOT NULL DEFAULT 0,
dm_user_email INTEGER NOT NULL DEFAULT 0,

f_name VARCHAR(50) NOT NULL,
m_name VARCHAR(50) NOT NULL,
l_name VARCHAR(50) NOT NULL,

uname VARCHAR(20) NOT NULL,
upwd VARCHAR(20) NOT NULL,
pwd_change_reqd BOOLEAN DEFAULT FALSE,
login_allowed BOOLEAN DEFAULT TRUE,
lost_passwd BOOLEAN DEFAULT FALSE,

create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
change_dt TIMESTAMP NOT NULL DEFAULT NOW(),
change_id INTEGER NOT NULL DEFAULT 0,
active_flag BOOLEAN NOT NULL DEFAULT TRUE

) WITH OIDS;

CREATE TABLE dm_user_history (

history_id SERIAL NOT NULL PRIMARY KEY,
hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW()

) INHERITS (dm_user);

CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO
dm_user_history SELECT * FROM dm_user WHERE id = old.id;

CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE
dm_user SET active_flag = FALSE WHERE id = old.id;

"Scott Frankel" <leknarf@pacbell.net> wrote in message
news:bd02bff5561d8b271301ba10bafca105@pacbell.net...

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost. If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
leaving the old row intact in the db for fishing expeditions,
posterity, &c.
-- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
building a full set of data through heavy lifting and multiple
queries
through 'n' number of old rows
-- overly complex query design probably leading to errors

3) create a new table that tracks changes
-- the table is either wide enough to mirror all columns in
the working table, or uses generic columns and API tricks to
parse token pair strings, ...

4) other?

Thanks
Scott

---------------------------(end of
broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#6codeWarrior
gpatnude@hotmail.com
In reply to: Berend Tober (#4)
Re: preserving data after updates

-----Original Message-----
From: Berend Tober [mailto:btober@seaworthysys.com]
Sent: Friday, March 04, 2005 8:47 AM
To: Greg Patnude
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] preserving data after updates

I use a modified form of option 3 with an ON UPDATE RULE.... the update

rule

copies the row to an inherited table...

I just gotta say that THAT is one COOL use of PG inheritance! Do you find
that
it works well and is robust and all the good stuff it seems like would be
the
case?

-- Berend

[GP->] Thank you... !

[GP->] I find it VERY effective and completely transparent to both the
programmer and the end-user... I don't use it on ALL of the tables in a
given schema... ONLY the tables where end-users can manipulate / change
data...
[GP->]
What it boils down to is that I can use it as a sort of a virtual "rollback"
system by querying the inherited table and updating the parent table with an
original value from the child -- Of course... this results in another change
to the child but it can also be undone...

#7Tzahi Fadida
tzahi_ml@myrealbox.com
In reply to: Scott Frankel (#1)
Re: preserving data after updates

Its called a "temporal database".
Usually its intended for medical or police databases where
you need a hind sight. i.e. if today is 31/12/2005, what did we know at
20/12/2005.
for example, for a doctor appearing at court and required to testify
what he knew at 20/12/2005.
Very cool.
It would be nice if postgreSQL could have a switch that
could turn it into a temporal database.

Regards,
tzahi.

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Frankel
Sent: Friday, March 04, 2005 1:51 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] preserving data after updates

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost. If I wanted to
track the changes to my data over time, it occurs to me that I could,

1) copy the whole row of data using the new value, thus
leaving the old row intact in the db for fishing expeditions,
posterity, &c.
-- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
building a full set of data through heavy lifting and multiple
queries
through 'n' number of old rows
-- overly complex query design probably leading to errors

3) create a new table that tracks changes
-- the table is either wide enough to mirror all columns in
the working table, or uses generic columns and API tricks to
parse token pair strings, ...

4) other?

Thanks
Scott

---------------------------(end of
broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Tzahi Fadida (#7)
Re: preserving data after updates

On Sat, Mar 05, 2005 at 03:46:53AM +0200, Tzahi Fadida wrote:

Its called a "temporal database".
Usually its intended for medical or police databases where
you need a hind sight. i.e. if today is 31/12/2005, what did we know at
20/12/2005.
for example, for a doctor appearing at court and required to testify
what he knew at 20/12/2005.
Very cool.
It would be nice if postgreSQL could have a switch that
could turn it into a temporal database.

It used to be builtin a long time ago. It's since been moved to the
contrib module "timetravel". It does historical queries and stuff...

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#9Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Tzahi Fadida (#7)
Re: preserving data after updates

Its called a "temporal database".
Usually its intended for medical or police databases where
you need a hind sight. i.e. if today is 31/12/2005, what did we know at
20/12/2005.
for example, for a doctor appearing at court and required to testify
what he knew at 20/12/2005.

What he "can have known" not "what he knew".

With GnuMed we are running a trigger based auditing solution
which so far works nicely.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346