temporal support patch

Started by Miroslav Šimulčíkalmost 14 years ago58 messageshackers
Jump to latest
#1Miroslav Šimulčík
simulcik.miro@gmail.com

Hi all,

as a part of my master's thesis I have created temporal support patch for
PostgreSQL. It enables the creation of special temporal tables with entries
versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these tables
don't cause permanent changes to entries, but create new versions of them.
Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found on
http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
- IBM DB2 10 for z/OS
- Oracle 11g Workspace Manager
- Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that is
easy to use and is backward compatible with existing applications, so that
the change of the original tables to temporal ones, does not require
changes to applications that work with them. This patch is built on
standard SQL/Temporal with some minor modifications inspired by commercial
temporal database systems. Currently it only deals with transaction time
support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

This command automatically creates all objects required for transaction
time support:

List of relations
Schema | Name | Type | Owner
--------+----------------------+----------+----------
public | person | table | tester
public | person__entry_id_seq | sequence | tester
public | person_hist | table | postgres

Table "public.person"
Column | Type |
Modifiers

------------+-----------------------------+------------------------------------------------------------------------------
name | character varying(50) |
_entry_id | bigint | not null default
nextval('person__entry_id_seq'::regclass)
_sys_start | timestamp without time zone | not null default
clock_timestamp()
_sys_end | timestamp without time zone | not null default
'294276-12-31 23:59:59.999999'::timestamp without time zone
Indexes:
"person__entry_id_idx" btree (_entry_id)
"person__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

Table "public.person_hist"
Column | Type | Modifiers
------------+-----------------------------+-----------
name | character varying(50) |
_entry_id | bigint | not null
_sys_start | timestamp without time zone | not null
_sys_end | timestamp without time zone | not null
Indexes:
"person_hist__entry_id_idx" btree (_entry_id)
"person_hist__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

Table person stores current versions of entries. 3 additional columns
are added:
_entry_id - id of entry. It groups together different versions of
entry.
_sys_start - beginning of the version validity period (version
creation timestamp).
_sys_end - end of the version validity period.

Table person_hist stores historical versions of entries. It has the
same structure and indexes as the person table, but without any constraints
and default values.

2. another way of creating transaction time table is adding transaction
time support to existing standard table using ALTER command.

CREATE TABLE person(name varchar(50));
ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

INSERT INTO person VALUES('Jack');

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------
Jack | 1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.999999

4. UPDATE entry

UPDATE person SET name = 'Tom';

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------
Tom | 1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.999999

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

DELETE FROM person;

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end
------+-----------+------------+----------

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

6. selecting entries

INSERT INTO person VALUES('Mike');
INSERT INTO person VALUES('Mike');

--standard SELECT - operates only with current versions of entries
SELECT * FROM person;

name
------
Mike
Mike

--special temporal SELECT which operates with all versions
NONSEQUENCED TRANSACTIONTIME SELECT *, _entry_id, _sys_start, _sys_end
FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------
Mike | 3 | 2012-05-16 22:20:55.055671 | 294276-12-31
23:59:59.999999
Mike | 2 | 2012-05-16 22:20:51.619475 | 294276-12-31
23:59:59.999999
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

--special temporal SELECT which operates with versions valid in
specified time
TRANSACTIONTIME AS OF '2012-05-16 22:11:39.856916' SELECT *, _entry_id,
_sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

--it is also possible to set timestamp globally for session. All
subsequent SELECTs without any temporal modifier will operate with versions
valid in this time,
SET history_timestamp TO '2012-05-16 22:11:39.856916';

SELECT * FROM person;

name
------
Jack

--to select only current versions when history_tiumestamp is set,
CURRENT TRANSACTIONTIME have to be used with SELECT
CURRENT TRANSACTIONTIME SELECT * FROM person;

name
------
Mike
Mike

This is only a illustration of main functionality. Later I can create a
document about the design and implementation details, but first I need to
know if such temporal features as described here, could be added to future
versions of PostgreSQL, after meeting all the requirements of a new patch.

Regards

Miroslav Simulcik

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Miroslav Šimulčík (#1)
Re: temporal support patch

Hello

what is conformance of your solution with temporal extension in ANSI SQL 2011

http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

2012/5/16 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Show quoted text

Hi all,

as a part of my master's thesis I have created temporal support patch for
PostgreSQL. It enables the creation of special temporal tables with entries
versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these tables
don't cause permanent changes to entries, but create new versions of them.
Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found
on http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
    - IBM DB2 10 for z/OS
    - Oracle 11g Workspace Manager
    - Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that is
easy to use and is backward compatible with existing applications, so that
the change of the original tables to temporal ones, does not require changes
to applications that work with them. This patch is built on standard
SQL/Temporal with some minor modifications inspired by commercial temporal
database systems. Currently it only deals with transaction time support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

    CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

    This command automatically creates all objects required for transaction
time support:

                  List of relations
      Schema |         Name         |   Type   |  Owner
     --------+----------------------+----------+----------
      public | person               | table    | tester
      public | person__entry_id_seq | sequence | tester
      public | person_hist          | table    | postgres

                                                  Table "public.person"
        Column   |            Type             |
     Modifiers

   ------------+-----------------------------+------------------------------------------------------------------------------
      name       | character varying(50)       |
      _entry_id  | bigint                      | not null default
nextval('person__entry_id_seq'::regclass)
      _sys_start | timestamp without time zone | not null default
clock_timestamp()
      _sys_end   | timestamp without time zone | not null default
'294276-12-31 23:59:59.999999'::timestamp without time zone
     Indexes:
         "person__entry_id_idx" btree (_entry_id)
         "person__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

              Table "public.person_hist"
        Column   |            Type             | Modifiers
     ------------+-----------------------------+-----------
      name       | character varying(50)       |
      _entry_id  | bigint                      | not null
      _sys_start | timestamp without time zone | not null
      _sys_end   | timestamp without time zone | not null
     Indexes:
         "person_hist__entry_id_idx" btree (_entry_id)
         "person_hist__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

    Table person stores current versions of entries. 3 additional columns
are added:
        _entry_id - id of entry. It groups together different versions of
entry.
        _sys_start - beginning of the version validity period (version
creation timestamp).
        _sys_end - end of the version validity period.

    Table person_hist stores historical versions of entries. It has the same
structure and indexes as the person table, but without any constraints and
default values.

2. another way of creating transaction time table is adding transaction time
support to existing standard table using ALTER command.

    CREATE TABLE person(name varchar(50));
    ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

    INSERT INTO person VALUES('Jack');

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.999999

4. UPDATE entry

    UPDATE person SET name = 'Tom';

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.999999

    SELECT * FROM person_hist;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

    DELETE FROM person;

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id | _sys_start | _sys_end
    ------+-----------+------------+----------

    SELECT * FROM person_hist;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

6. selecting entries

    INSERT INTO person VALUES('Mike');
    INSERT INTO person VALUES('Mike');

    --standard SELECT - operates only with current versions of entries
    SELECT * FROM person;

     name
    ------
     Mike
     Mike

    --special temporal SELECT which operates with all versions
    NONSEQUENCED TRANSACTIONTIME SELECT *, _entry_id, _sys_start, _sys_end
FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Mike |         3 | 2012-05-16 22:20:55.055671 | 294276-12-31
23:59:59.999999
     Mike |         2 | 2012-05-16 22:20:51.619475 | 294276-12-31
23:59:59.999999
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

    --special temporal SELECT which operates with versions valid in
specified time
    TRANSACTIONTIME AS OF '2012-05-16 22:11:39.856916' SELECT *, _entry_id,
_sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

    --it is also possible to set timestamp globally for session. All
subsequent SELECTs without any temporal modifier will operate with versions
valid in this time,
    SET history_timestamp TO '2012-05-16 22:11:39.856916';

    SELECT * FROM person;

     name
    ------
     Jack

    --to select only current versions when history_tiumestamp is set,
CURRENT TRANSACTIONTIME have to be used with SELECT
    CURRENT TRANSACTIONTIME SELECT * FROM person;

     name
    ------
     Mike
     Mike

This is only a illustration of main functionality. Later I can create a
document about the design and implementation details, but first I need to
know if such temporal features as described here, could be added to future
versions of PostgreSQL, after meeting all the requirements of a new patch.

Regards

Miroslav Simulcik

#3Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Miroslav Šimulčík (#1)
Re: temporal support patch

Hello.

Basically my implementation serve for the same thing as table_log extension -
entry versioning. It also uses history table and triggers to store old
versions of entries. After quick review of table_log extension, here is
comparison with my solution:

tale_log advantages compared to my solution:
- no columns added to the original table
- more control over created objects

advantages of my solution compared to table_log
- built in syntax
- more user friendly - everything is done automatically
- no data redundancy - in my extension current versions of entries are
stored only once in original table (in table_log - entries are inserted to
both original and log table)
- no speed degradation for INSERT statement - no triggers called on INSERT
- faster triggers - triggers in my implementation are based on referential
integrity triggers and use prepared statements
- original table alterations are automatically done also on history table
- easier selection of data valid in specific time - each row contains start
and end time
- easier restoration to table - TRANSACTIONTIME AS OF expression SELECT *
INTO restoration_table FROM original_table;

Regards
Miroslav Simulcik

2012/5/17 A.M. <agentm@themactionfaction.com>

Show quoted text

On May 16, 2012, at 5:14 PM, Miroslav Šimulčík wrote:

Hi all,

as a part of my master's thesis I have created temporal support patch

for PostgreSQL. It enables the creation of special temporal tables with
entries versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on
these tables don't cause permanent changes to entries, but create new
versions of them. Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found on

http://en.wikipedia.org/wiki/Temporal_database

Hello!

I currently use the table_log extension:
http://pgfoundry.org/projects/tablelog/

Other than the built-in syntax, how does your implementation differ
feature-wise?

Cheers,
M

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Miroslav Šimulčík (#1)
Re: temporal support patch

Miroslav Šimulcík wrote:

as a part of my master's thesis I have created temporal support patch for PostgreSQL. It enables the
creation of special temporal tables with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries, but create new versions of them.
Thus user can easily get to the past states of the table.

Most of that is something you could handle with triggers (not TRUNCATE of course).

What does your proposed feature do that a set of triggers wouldn't?

Yours,
Laurenz Albe

#5Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Pavel Stehule (#2)
Re: temporal support patch

Hello.

SQL 2011 standard wasn't available in time I started this project so I
built my implementation on older standards TSQL2 and SQL/Temporal, that
were only available. None of these were accepted by ANSI/ISO commissions
however.

There is different syntax in SQL 2011 and it looks like one that IBM DB2
had been using even before this standard were published.

So my implementation differs in syntax, but features are same as stated in
"system versioned tables" part of slideshow.

Regards
Miroslav Simulcik

2012/5/17 Pavel Stehule <pavel.stehule@gmail.com>

Show quoted text

Hello

what is conformance of your solution with temporal extension in ANSI SQL
2011

http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

2012/5/16 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Hi all,

as a part of my master's thesis I have created temporal support patch for
PostgreSQL. It enables the creation of special temporal tables with

entries

versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these

tables

don't cause permanent changes to entries, but create new versions of

them.

Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found
on http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
- IBM DB2 10 for z/OS
- Oracle 11g Workspace Manager
- Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that is
easy to use and is backward compatible with existing applications,

so that

the change of the original tables to temporal ones, does not require

changes

to applications that work with them. This patch is built on standard
SQL/Temporal with some minor modifications inspired by commercial

temporal

database systems. Currently it only deals with transaction time support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

This command automatically creates all objects required for

transaction

time support:

List of relations
Schema | Name | Type | Owner
--------+----------------------+----------+----------
public | person | table | tester
public | person__entry_id_seq | sequence | tester
public | person_hist | table | postgres

Table "public.person"
Column | Type |
Modifiers

------------+-----------------------------+------------------------------------------------------------------------------

name | character varying(50) |
_entry_id | bigint | not null default
nextval('person__entry_id_seq'::regclass)
_sys_start | timestamp without time zone | not null default
clock_timestamp()
_sys_end | timestamp without time zone | not null default
'294276-12-31 23:59:59.999999'::timestamp without time zone
Indexes:
"person__entry_id_idx" btree (_entry_id)
"person__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

Table "public.person_hist"
Column | Type | Modifiers
------------+-----------------------------+-----------
name | character varying(50) |
_entry_id | bigint | not null
_sys_start | timestamp without time zone | not null
_sys_end | timestamp without time zone | not null
Indexes:
"person_hist__entry_id_idx" btree (_entry_id)
"person_hist__sys_start__sys_end_idx" btree (_sys_start,

_sys_end)

Table person stores current versions of entries. 3 additional columns
are added:
_entry_id - id of entry. It groups together different versions of
entry.
_sys_start - beginning of the version validity period (version
creation timestamp).
_sys_end - end of the version validity period.

Table person_hist stores historical versions of entries. It has the

same

structure and indexes as the person table, but without any constraints

and

default values.

2. another way of creating transaction time table is adding transaction

time

support to existing standard table using ALTER command.

CREATE TABLE person(name varchar(50));
ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

INSERT INTO person VALUES('Jack');

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.999999

4. UPDATE entry

UPDATE person SET name = 'Tom';

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------

Tom | 1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.999999

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

DELETE FROM person;

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end
------+-----------+------------+----------

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

6. selecting entries

INSERT INTO person VALUES('Mike');
INSERT INTO person VALUES('Mike');

--standard SELECT - operates only with current versions of entries
SELECT * FROM person;

name
------
Mike
Mike

--special temporal SELECT which operates with all versions
NONSEQUENCED TRANSACTIONTIME SELECT *, _entry_id, _sys_start,

_sys_end

FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+------------------------------

Mike | 3 | 2012-05-16 22:20:55.055671 | 294276-12-31
23:59:59.999999
Mike | 2 | 2012-05-16 22:20:51.619475 | 294276-12-31
23:59:59.999999
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

--special temporal SELECT which operates with versions valid in
specified time
TRANSACTIONTIME AS OF '2012-05-16 22:11:39.856916' SELECT *,

_entry_id,

_sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

--it is also possible to set timestamp globally for session. All
subsequent SELECTs without any temporal modifier will operate

with versions

valid in this time,
SET history_timestamp TO '2012-05-16 22:11:39.856916';

SELECT * FROM person;

name
------
Jack

--to select only current versions when history_tiumestamp is set,
CURRENT TRANSACTIONTIME have to be used with SELECT
CURRENT TRANSACTIONTIME SELECT * FROM person;

name
------
Mike
Mike

This is only a illustration of main functionality. Later I can create a
document about the design and implementation details, but first I need to
know if such temporal features as described here, could be added to

future

versions of PostgreSQL, after meeting all the requirements of a new

patch.

Regards

Miroslav Simulcik

#6Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Laurenz Albe (#4)
Re: temporal support patch

Hi,

yes you are right, storing old versions of entry to history table can be
handled with triggers and my solution also uses triggers to do that.
Advantage of my implementation is that user doesn't need to create all
necessary objects (triggers, history table, begin/end columns ...)
manually. He just need to use AS TRANSACTIONTIME clause in CREATE TABLE
command and everything else is done automatically by backend.

I focused on backward compatibility with existing applications, so that if
versioning is added to original table, application can run correctly
without any changes. For example columns for beginning and end timestamp
are added to original table. This requires modifying of star symbol '*'
expansion in SELECT statement processing, because newly added (internal)
column have to be implicitly hidden. The same for INSERT command without
column list.

Simply said, my implementation makes work with temporal tables easier a
minimizes limitations for existing applications. User can work with
temporal table, just like with any other table and don't need to bother
with triggers, columns, history tables etc. For example if you want add
column to table with versioning, you can use simple ALTER TABLE ADD COLLUMN
and column is automatically added to history table too.

Regards
Miroslav Simulcik

2012/5/18 Albe Laurenz <laurenz.albe@wien.gv.at>

Show quoted text

Miroslav Šimulcík wrote:

as a part of my master's thesis I have created temporal support patch

for PostgreSQL. It enables the

creation of special temporal tables with entries versioning. Modifying

operations (UPDATE, DELETE,

TRUNCATE) on these tables don't cause permanent changes to entries, but

create new versions of them.

Thus user can easily get to the past states of the table.

Most of that is something you could handle with triggers (not TRUNCATE of
course).

What does your proposed feature do that a set of triggers wouldn't?

Yours,
Laurenz Albe

#7Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Miroslav Šimulčík (#1)
Re: temporal support patch

On Wed, May 16, 2012 at 4:14 PM, Miroslav Šimulčík
<simulcik.miro@gmail.com> wrote:

This is only a illustration of main functionality. Later I can create a
document about the design and implementation details, but first I need to
know if such temporal features as described here, could be added to future
versions of PostgreSQL, after meeting all the requirements of a new patch.

how this affect to users *not* using this feature? i mean, it affects
performance? probably not but...

if you already have the patch (i understand it is from what you said)
add it to the first commitfest
(https://commitfest.postgresql.org/action/commitfest_view?id=14) that
will start on june 15. is difficult to know if something we haven't
seen is acceptable or not (please add some description about the way
you did it)

if the patch is still not ready, then start explaining the design you
are going to use...

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Miroslav Šimulčík (#5)
Re: temporal support patch

Hello

2012/5/18 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Hello.

SQL 2011 standard wasn't available in time I started this project so I built
my implementation on older standards TSQL2 and SQL/Temporal, that were only
available. None of these were accepted by ANSI/ISO commissions however.

There is different syntax in SQL 2011 and it looks like one that IBM DB2 had
been using even before this standard were published.

So my implementation differs in syntax, but features are same as stated in
"system versioned tables" part of slideshow.

I would to see temporal functionality in pg, but only in SQL 2011
syntax. Using syntax from deprecated proposals has no sense. I am not
sure so history table concept is best from performance view - it is
simpler for implementation, but you duplicate all indexes - there will
be lot of redundant fields in history table. A important query is
difference in cost for some non trivial query for actual data and same
query for historic data.

Regards

Pavel Stehule

Show quoted text

Regards
Miroslav Simulcik

2012/5/17 Pavel Stehule <pavel.stehule@gmail.com>

Hello

what is conformance of your solution with temporal extension in ANSI SQL
2011

http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

2012/5/16 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Hi all,

as a part of my master's thesis I have created temporal support patch
for
PostgreSQL. It enables the creation of special temporal tables with
entries
versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
tables
don't cause permanent changes to entries, but create new versions of
them.
Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found
on http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
    - IBM DB2 10 for z/OS
    - Oracle 11g Workspace Manager
    - Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that is
easy to use and is backward compatible with existing applications,
so that
the change of the original tables to temporal ones, does not require
changes
to applications that work with them. This patch is built on standard
SQL/Temporal with some minor modifications inspired by commercial
temporal
database systems. Currently it only deals with transaction time support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

    CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

    This command automatically creates all objects required for
transaction
time support:

                  List of relations
      Schema |         Name         |   Type   |  Owner
     --------+----------------------+----------+----------
      public | person               | table    | tester
      public | person__entry_id_seq | sequence | tester
      public | person_hist          | table    | postgres

                                                  Table "public.person"
        Column   |            Type             |
     Modifiers

   ------------+-----------------------------+------------------------------------------------------------------------------
      name       | character varying(50)       |
      _entry_id  | bigint                      | not null default
nextval('person__entry_id_seq'::regclass)
      _sys_start | timestamp without time zone | not null default
clock_timestamp()
      _sys_end   | timestamp without time zone | not null default
'294276-12-31 23:59:59.999999'::timestamp without time zone
     Indexes:
         "person__entry_id_idx" btree (_entry_id)
         "person__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

              Table "public.person_hist"
        Column   |            Type             | Modifiers
     ------------+-----------------------------+-----------
      name       | character varying(50)       |
      _entry_id  | bigint                      | not null
      _sys_start | timestamp without time zone | not null
      _sys_end   | timestamp without time zone | not null
     Indexes:
         "person_hist__entry_id_idx" btree (_entry_id)
         "person_hist__sys_start__sys_end_idx" btree (_sys_start,
_sys_end)

    Table person stores current versions of entries. 3 additional
columns
are added:
        _entry_id - id of entry. It groups together different versions
of
entry.
        _sys_start - beginning of the version validity period (version
creation timestamp).
        _sys_end - end of the version validity period.

    Table person_hist stores historical versions of entries. It has the
same
structure and indexes as the person table, but without any constraints
and
default values.

2. another way of creating transaction time table is adding transaction
time
support to existing standard table using ALTER command.

    CREATE TABLE person(name varchar(50));
    ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

    INSERT INTO person VALUES('Jack');

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.999999

4. UPDATE entry

    UPDATE person SET name = 'Tom';

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.999999

    SELECT * FROM person_hist;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

    DELETE FROM person;

    SELECT *, _entry_id, _sys_start, _sys_end FROM person;

     name | _entry_id | _sys_start | _sys_end
    ------+-----------+------------+----------

    SELECT * FROM person_hist;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

6. selecting entries

    INSERT INTO person VALUES('Mike');
    INSERT INTO person VALUES('Mike');

    --standard SELECT - operates only with current versions of entries
    SELECT * FROM person;

     name
    ------
     Mike
     Mike

    --special temporal SELECT which operates with all versions
    NONSEQUENCED TRANSACTIONTIME SELECT *, _entry_id, _sys_start,
_sys_end
FROM person;

     name | _entry_id |         _sys_start         |           _sys_end

------+-----------+----------------------------+------------------------------
     Mike |         3 | 2012-05-16 22:20:55.055671 | 294276-12-31
23:59:59.999999
     Mike |         2 | 2012-05-16 22:20:51.619475 | 294276-12-31
23:59:59.999999
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
     Tom  |         1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

    --special temporal SELECT which operates with versions valid in
specified time
    TRANSACTIONTIME AS OF '2012-05-16 22:11:39.856916' SELECT *,
_entry_id,
_sys_start, _sys_end FROM person;

     name | _entry_id |         _sys_start         |          _sys_end

------+-----------+----------------------------+----------------------------
     Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

    --it is also possible to set timestamp globally for session. All
subsequent SELECTs without any temporal modifier will operate
with versions
valid in this time,
    SET history_timestamp TO '2012-05-16 22:11:39.856916';

    SELECT * FROM person;

     name
    ------
     Jack

    --to select only current versions when history_tiumestamp is set,
CURRENT TRANSACTIONTIME have to be used with SELECT
    CURRENT TRANSACTIONTIME SELECT * FROM person;

     name
    ------
     Mike
     Mike

This is only a illustration of main functionality. Later I can create a
document about the design and implementation details, but first I need
to
know if such temporal features as described here, could be added to
future
versions of PostgreSQL, after meeting all the requirements of a new
patch.

Regards

Miroslav Simulcik

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Miroslav Šimulčík (#3)
Re: temporal support patch

On 5/18/12 2:06 AM, Miroslav Šimulčík wrote:

- no data redundancy - in my extension current versions of entries are stored only once in original table (in table_log - entries are inserted to both original and log table)

That's not necessarily a benefit... it makes querying for both history *and* current data a lot more complex. Table inheritance might be an elegant solution to that, but I doubt you could just bolt that on top of what you've created.

The timestamp fields need to have timezone info. If you change the timezone for a connection you will get inconsistent results without it.

_sys_end should either be NULLable or if it's going to have a magic value that magic value should be "Infinity":

decibel@workbook.local=# create table t(t timestamptz);
CREATE TABLE
decibel@workbook.local=# insert into t values('infinity');
INSERT 0 1
decibel@workbook.local=# select * from t;
t
----------
infinity
(1 row)

decibel@workbook.local=#
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#10Jeff Davis
pgsql@j-davis.com
In reply to: Miroslav Šimulčík (#1)
Re: temporal support patch

On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:

Hi all,

as a part of my master's thesis I have created temporal support patch
for PostgreSQL. It enables the creation of special temporal tables
with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries,
but create new versions of them. Thus user can easily get to the past
states of the table.

I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.

* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.

* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like "show me any time a salary was
changed over the last month" (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a "columns changed" bitmap or something).

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the "contains" operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.

* I do like having special DDL that creates the appropriate objects.
That helps to guide users so they don't have to invent their own
solution with triggers, etc.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).

* It might make sense to hammer out as many of the details as we can
with an extension. For instance, exactly what options will be available,
what data types will be used, what objects will be created, the trigger
code, etc. Then, it will be more obvious exactly what we need to add
extra core support for (e.g. if we are going to use some inheritance
like mechanism), and what we need to add syntax sugar for.

I recommend that you start posting more detailed designs on
http://wiki.postgresql.org

If you already have code, feel free to submit it for the next commitfest
( http://commitfest.postgresql.org ), but this is a relatively large
project, so it will most likely take several commitfest cycles.

Regards,
Jeff Davis

#11Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Pavel Stehule (#8)
Re: temporal support patch

There would be no problem to make my solution compatible with SQL 2011, but
the standard is not freely available. Can anybody provide me with this
standard?

2012/5/20 Pavel Stehule <pavel.stehule@gmail.com>

Show quoted text

Hello

2012/5/18 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Hello.

SQL 2011 standard wasn't available in time I started this project so I

built

my implementation on older standards TSQL2 and SQL/Temporal, that were

only

available. None of these were accepted by ANSI/ISO commissions however.

There is different syntax in SQL 2011 and it looks like one that IBM DB2

had

been using even before this standard were published.

So my implementation differs in syntax, but features are same as stated

in

"system versioned tables" part of slideshow.

I would to see temporal functionality in pg, but only in SQL 2011
syntax. Using syntax from deprecated proposals has no sense. I am not
sure so history table concept is best from performance view - it is
simpler for implementation, but you duplicate all indexes - there will
be lot of redundant fields in history table. A important query is
difference in cost for some non trivial query for actual data and same
query for historic data.

Regards

Pavel Stehule

Regards
Miroslav Simulcik

2012/5/17 Pavel Stehule <pavel.stehule@gmail.com>

Hello

what is conformance of your solution with temporal extension in ANSI SQL
2011

http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

2012/5/16 Miroslav Šimulčík <simulcik.miro@gmail.com>:

Hi all,

as a part of my master's thesis I have created temporal support patch
for
PostgreSQL. It enables the creation of special temporal tables with
entries
versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
tables
don't cause permanent changes to entries, but create new versions of
them.
Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found
on http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
- IBM DB2 10 for z/OS
- Oracle 11g Workspace Manager
- Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that

is

easy to use and is backward compatible with existing applications,
so that
the change of the original tables to temporal ones, does not require
changes
to applications that work with them. This patch is built on standard
SQL/Temporal with some minor modifications inspired by commercial
temporal
database systems. Currently it only deals with transaction time

support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

This command automatically creates all objects required for
transaction
time support:

List of relations
Schema | Name | Type | Owner
--------+----------------------+----------+----------
public | person | table | tester
public | person__entry_id_seq | sequence | tester
public | person_hist | table | postgres

Table

"public.person"

Column | Type |
Modifiers

------------+-----------------------------+------------------------------------------------------------------------------

name | character varying(50) |
_entry_id | bigint | not null default
nextval('person__entry_id_seq'::regclass)
_sys_start | timestamp without time zone | not null default
clock_timestamp()
_sys_end | timestamp without time zone | not null default
'294276-12-31 23:59:59.999999'::timestamp without time zone
Indexes:
"person__entry_id_idx" btree (_entry_id)
"person__sys_start__sys_end_idx" btree (_sys_start, _sys_end)

Table "public.person_hist"
Column | Type | Modifiers
------------+-----------------------------+-----------
name | character varying(50) |
_entry_id | bigint | not null
_sys_start | timestamp without time zone | not null
_sys_end | timestamp without time zone | not null
Indexes:
"person_hist__entry_id_idx" btree (_entry_id)
"person_hist__sys_start__sys_end_idx" btree (_sys_start,
_sys_end)

Table person stores current versions of entries. 3 additional
columns
are added:
_entry_id - id of entry. It groups together different versions
of
entry.
_sys_start - beginning of the version validity period (version
creation timestamp).
_sys_end - end of the version validity period.

Table person_hist stores historical versions of entries. It has

the

same
structure and indexes as the person table, but without any constraints
and
default values.

2. another way of creating transaction time table is adding

transaction

time
support to existing standard table using ALTER command.

CREATE TABLE person(name varchar(50));
ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

INSERT INTO person VALUES('Jack');

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start |

_sys_end

------+-----------+----------------------------+------------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.999999

4. UPDATE entry

UPDATE person SET name = 'Tom';

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start |

_sys_end

------+-----------+----------------------------+------------------------------

Tom | 1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.999999

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

DELETE FROM person;

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end
------+-----------+------------+----------

SELECT * FROM person_hist;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

6. selecting entries

INSERT INTO person VALUES('Mike');
INSERT INTO person VALUES('Mike');

--standard SELECT - operates only with current versions of entries
SELECT * FROM person;

name
------
Mike
Mike

--special temporal SELECT which operates with all versions
NONSEQUENCED TRANSACTIONTIME SELECT *, _entry_id, _sys_start,
_sys_end
FROM person;

name | _entry_id | _sys_start |

_sys_end

------+-----------+----------------------------+------------------------------

Mike | 3 | 2012-05-16 22:20:55.055671 | 294276-12-31
23:59:59.999999
Mike | 2 | 2012-05-16 22:20:51.619475 | 294276-12-31
23:59:59.999999
Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194
Tom | 1 | 2012-05-16 22:11:44.736195 | 2012-05-16
22:14:33.875869

--special temporal SELECT which operates with versions valid in
specified time
TRANSACTIONTIME AS OF '2012-05-16 22:11:39.856916' SELECT *,
_entry_id,
_sys_start, _sys_end FROM person;

name | _entry_id | _sys_start | _sys_end

------+-----------+----------------------------+----------------------------

Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

--it is also possible to set timestamp globally for session. All
subsequent SELECTs without any temporal modifier will operate
with versions
valid in this time,
SET history_timestamp TO '2012-05-16 22:11:39.856916';

SELECT * FROM person;

name
------
Jack

--to select only current versions when history_tiumestamp is set,
CURRENT TRANSACTIONTIME have to be used with SELECT
CURRENT TRANSACTIONTIME SELECT * FROM person;

name
------
Mike
Mike

This is only a illustration of main functionality. Later I can create

a

document about the design and implementation details, but first I need
to
know if such temporal features as described here, could be added to
future
versions of PostgreSQL, after meeting all the requirements of a new
patch.

Regards

Miroslav Simulcik

#12Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Jim Nasby (#9)
Re: temporal support patch

2012/5/30 Jim Nasby <jim@nasby.net>

On 5/18/12 2:06 AM, Miroslav Šimulčík wrote:

- no data redundancy - in my extension current versions of entries are
stored only once in original table (in table_log - entries are inserted to
both original and log table)

That's not necessarily a benefit... it makes querying for both history
*and* current data a lot more complex. Table inheritance might be an
elegant solution to that, but I doubt you could just bolt that on top of
what you've created.

Yes, querying for history data is more complex, but i focused on preserving
the performance of current queries. That's the reason why I use separate
table for old versions.

Table inheritance is very good idea and it will not require so much effort
to use it in my solution. Currently, when user queries whole history of
entries, table reference in FROM clause is replaced with subselect, which
access data in both tables. For example when user executes command:

NONSEQUENCED TRANSACTIONTIME SELECT * FROM person;

The actually executed command is:

SELECT * FROM (SELECT * FROM person UNION ALL SELECT * FROM person_hist) as
person

Use of table inheritance can make things simpler and more elegant, but I'm
not sure about how it affect performance. Will it cause gain in performance?

The timestamp fields need to have timezone info. If you change the timezone

for a connection you will get inconsistent results without it.

_sys_end should either be NULLable or if it's going to have a magic value
that magic value should be "Infinity":

Good point. I will use timestamp with timezone and value "Infinity"
instead of max timestamp value

#13Miroslav Šimulčík
simulcik.miro@gmail.com
In reply to: Jeff Davis (#10)
Re: temporal support patch

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.

I agree, new range types will be ideal for this

* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.

Yes I considered addition of user ID and transaction ID columns, because it
can be useful in some cases (for example to find all changes made by
transaction). However it wasn't necessary, so i omitted it. It can be
easily added.

* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like "show me any time a salary was
changed over the last month" (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a "columns changed" bitmap or something).

Another useful feature. I can take a look on it

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).

See reply to Jim's post.

* It might make sense to hammer out as many of the details as we can
with an extension. For instance, exactly what options will be available,
what data types will be used, what objects will be created, the trigger
code, etc. Then, it will be more obvious exactly what we need to add
extra core support for (e.g. if we are going to use some inheritance
like mechanism), and what we need to add syntax sugar for.

I recommend that you start posting more detailed designs on
http://wiki.postgresql.org

In which section of wiki can I post detailed design of my solution?

If you already have code, feel free to submit it for the next commitfest
( http://commitfest.postgresql.org ), but this is a relatively large
project, so it will most likely take several commitfest cycles.

I have working patch for postgresql version 9.0.4, but it needs refactoring
before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it will be
better to discuss design first and then deal with code. Do you insist on
compatibility with standard SQL 2011 as Pavel wrote?

#14Merlin Moncure
mmoncure@gmail.com
In reply to: Miroslav Šimulčík (#13)
Re: temporal support patch

On Wed, Jun 13, 2012 at 4:10 PM, Miroslav Šimulčík
<simulcik.miro@gmail.com> wrote:

I have working patch for postgresql version 9.0.4, but it needs refactoring
before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it will be
better to discuss design first and then deal with code. Do you insist on
compatibility with standard SQL 2011 as Pavel wrote?

Standards compliance is always going to make things easier in terms of
gaining community acceptance if you're targeting in core adoption. At
the very least it will remove one barrier although you might be in for
a slog for other reasons. You may not have known this, but postgres
had a time travel feature waaay back in the day (see:
http://www.postgresql.org/docs/6.3/static/c0503.htm). It was removed
for performance reasons and the first thing I'm wondering is how your
stuff performs in various scenarios and various other interesting
things. Also, +1 on use of range types Anyways, thanks for submitting
and good luck!

merlin

#15Jeff Davis
pgsql@j-davis.com
In reply to: Miroslav Šimulčík (#13)
Re: temporal support patch

On Wed, 2012-06-13 at 23:10 +0200, Miroslav Šimulčík wrote:

I have working patch for postgresql version 9.0.4, but it needs
refactoring before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it
will be better to discuss design first and then deal with code. Do you
insist on compatibility with standard SQL 2011 as Pavel wrote?

Try to work on solving the problem and identify the use cases. I don't
think the standard will cause a major problem, we should be able to make
the relevant parts of your patch match the standard.

That's one reason to work on it as an extension first: we can get a
better sense of the problem space and various use cases without worrying
about violating any standard. Then, as you need specific backend support
(e.g. special syntax), we can take the standards more seriously.

Regards,
Jeff Davis

#16Vlad Arkhipov
arhipov@dc.baikal.ru
In reply to: Jeff Davis (#15)
Re: temporal support patch

On 06/15/2012 03:59 PM, Jeff Davis wrote:

On Wed, 2012-06-13 at 23:10 +0200, Miroslav Šimulčík wrote:

I have working patch for postgresql version 9.0.4, but it needs
refactoring before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it
will be better to discuss design first and then deal with code. Do you
insist on compatibility with standard SQL 2011 as Pavel wrote?

Try to work on solving the problem and identify the use cases. I don't
think the standard will cause a major problem, we should be able to make
the relevant parts of your patch match the standard.

That's one reason to work on it as an extension first: we can get a
better sense of the problem space and various use cases without worrying
about violating any standard. Then, as you need specific backend support
(e.g. special syntax), we can take the standards more seriously.

Regards,
Jeff Davis

What's wrong with SPI/timetravel extension for system versioning?
http://www.postgresql.org/docs/9.1/static/contrib-spi.html

We are heavily using system-versioned and application-time period tables
in our enterprise products (most of them are bi-temporal). However our
implementation is based on triggers and views and therefore is not very
convenient to use. There are also some locking issues with foreign keys
to application-time period tables. It will be great if the new temporal
SQL features will be included in the Postgresql core with SQL 2011
syntax support. It is especially important for bi-temporal tables
because of complex internal logic of UPDATE/DELETE and huge SELECT
queries for such tables.

#17Jeff Davis
pgsql@j-davis.com
In reply to: Vlad Arkhipov (#16)
Re: temporal support patch

On Mon, 2012-06-18 at 19:34 +0900, Vlad Arkhipov wrote:

What's wrong with SPI/timetravel extension for system versioning?
http://www.postgresql.org/docs/9.1/static/contrib-spi.html

We are heavily using system-versioned and application-time period
tables in our enterprise products (most of them are bi-temporal).
However our implementation is based on triggers and views and
therefore is not very convenient to use. There are also some locking
issues with foreign keys to application-time period tables. It will be
great if the new temporal SQL features will be included in the
Postgresql core with SQL 2011 syntax support. It is especially
important for bi-temporal tables because of complex internal logic of
UPDATE/DELETE and huge SELECT queries for such tables.

I've already pointed out some missing features in this thread, but the
big ones in my mind are:

1. It doesn't use 9.2 Range Types, which would help in a lot of ways
(like making the SELECT queries a lot simpler and faster).

2. It's missing a lot of options, like storing the user that modified a
row or the changed columns.

Regards,
Jeff Davis

#18Vlad Arkhipov
arhipov@dc.baikal.ru
In reply to: Jeff Davis (#10)
Re: temporal support patch

On 05/31/2012 11:52 AM, Jeff Davis wrote:

On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:

Hi all,

as a part of my master's thesis I have created temporal support patch
for PostgreSQL. It enables the creation of special temporal tables
with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries,
but create new versions of them. Thus user can easily get to the past
states of the table.

I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.

It's not sufficient to store only a period of validity for a row. If two
transactions started in the same time change the same record, you have a
problem with TSTZRANGE type because it's normalized to empty interval.
The other issue is how to handle multiple changes of the same record
within the transaction. Should they be stored or not?
Also it's necessary to store some kind of operation type that was
applied to the record (insert/update/delete). For example, there is a
table with one record with validity period [0, ) and value 'A'.

First way
1. Delete this record in time 1, now there is [0, 1), A in the history
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

Second way
1. Update this record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

So you have the same data in the tables but the actions that led to this
configuration were different and the history has been lost partly.

* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.

I'm not sure that the database user is the proper thing to be stored in
the history table. Many applications usually connect to a database using
some virtual user and have their own users/roles tables to handle with
privileges. There should be some way to substitute the stored user in
the history table with the application's one. It's also helpful to store
transaction id that inserted/updated/deleted the record.

* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like "show me any time a salary was
changed over the last month" (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a "columns changed" bitmap or something).

It's a great proposal but seems to be impossible to implement with
triggers only solution, isn't it? Is there any kind of hooks on ALTER
TABLE ... in PostgreSQL to update changed columns bitmaps when table
structure changes?

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the "contains" operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.

In SQL2011 there is only one table with the all data, historical and
current. So it's not very convenient to specifiy WHERE condition on
system time everywhere and for all tables in the query. By default only
the current data is selected with a query like SELECT * FROM table.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).

We have a success experience with inheritance with our trigger-based
solution. It's completely transparent for the existing applications and
does not have any impact on performance.

#19Jeff Davis
pgsql@j-davis.com
In reply to: Vlad Arkhipov (#18)
Re: temporal support patch

On Mon, 2012-06-25 at 17:46 +0900, Vlad Arkhipov wrote:

It's not sufficient to store only a period of validity for a row. If two
transactions started in the same time change the same record, you have a
problem with TSTZRANGE type because it's normalized to empty interval.

That's an interesting point.

Let's say you tried setting it to [T1, T2) where T1 is the time of the
last transaction to update it and T2 is the time of the current
transaction. If T2 <= T1, then TSTZRANGE will throw an error, not store
the empty interval.

And we don't want to store the empty interval, because it would be a
lie. There could have been some transaction T3 that happened during T2
that saw the value from T1, so saying that there were no times where
that was visible to the system is false. Throwing an error allows you to
retry T2, which should allow a microsecond or so to pass, and the
problem should resolve itself (assuming your clock didn't move
backwards, which is a different issue).

We could also argue about the start versus end times of transactions,
and snapshot acquisition times, because that could cause confusion if
there are long-running transactions. It might be a good reason to store
the modifying transaction ID as well, but then you get into transaction
wraparound problems.

The other issue is how to handle multiple changes of the same record
within the transaction. Should they be stored or not?

In a typical audit log, I don't see any reason to. The internals of a
transaction should be implementation details; invisible to the outside,
right?

Also it's necessary to store some kind of operation type that was
applied to the record (insert/update/delete). For example, there is a
table with one record with validity period [0, ) and value 'A'.

First way
1. Delete this record in time 1, now there is [0, 1), A in the history
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

Second way
1. Update this record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

So you have the same data in the tables but the actions that led to this
configuration were different and the history has been lost partly.

Right. Those are yet more possible options that people might want for an
audit log.

* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.

I'm not sure that the database user is the proper thing to be stored in
the history table. Many applications usually connect to a database using
some virtual user and have their own users/roles tables to handle with
privileges. There should be some way to substitute the stored user in
the history table with the application's one. It's also helpful to store
transaction id that inserted/updated/deleted the record.

If the system is recording it for audit purposes, then it better be sure
that it's true. You can't allow the application to pick and choose what
gets stored there.

While it may be true that many applications just all use the same DB
user, if you want an audit log that includes user information you have
to let the DB do some authentication.

It's a great proposal but seems to be impossible to implement with
triggers only solution, isn't it? Is there any kind of hooks on ALTER
TABLE ... in PostgreSQL to update changed columns bitmaps when table
structure changes?

Column numbers are never reused, so I think it would be stable. But if
you do need to be notified of schema changes, the new event triggers
mechanism may be able to do that.

In SQL2011 there is only one table with the all data, historical and
current. So it's not very convenient to specifiy WHERE condition on
system time everywhere and for all tables in the query. By default only
the current data is selected with a query like SELECT * FROM table.

If there is some syntax that offers a convenient shorthand for WHERE,
that's fine with me. Or using two tables, one called foo and one called
foo_history, is also fine. But I don't want the DML syntax to introduce
new mechanisms that aren't available without the fancy syntax (though
new DDL arrangements might be fine).

Regards,
Jeff Davis

#20Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeff Davis (#19)
Re: temporal support patch

If there is some syntax that offers a convenient shorthand for WHERE,
that's fine with me. Or using two tables, one called foo and one called
foo_history, is also fine. But I don't want the DML syntax to introduce
new mechanisms that aren't available without the fancy syntax (though
new DDL arrangements might be fine).

convention foo_history is used often - and it should be significant
break for migration

Regards

Pavel Stehule

Show quoted text

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#19)
#22David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#21)
#23Josh Berkus
josh@agliodbs.com
In reply to: David G. Johnston (#22)
#24Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#23)
#25Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#21)
#26Jeff Davis
pgsql@j-davis.com
In reply to: David G. Johnston (#22)
#27Jeff Davis
pgsql@j-davis.com
In reply to: Josh Berkus (#23)
#28Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#24)
#29Craig Ringer
craig@2ndquadrant.com
In reply to: Jeff Davis (#27)
#30Craig Ringer
craig@2ndquadrant.com
In reply to: Jeff Davis (#27)
#31Vlad Arkhipov
arhipov@dc.baikal.ru
In reply to: Jeff Davis (#27)
#32Anssi Kääriäinen
anssi.kaariainen@thl.fi
In reply to: Vlad Arkhipov (#31)
#33Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#28)
#34David Fetter
david@fetter.org
In reply to: Jeff Davis (#26)
#35Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Kevin Grittner (#33)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gavin Flower (#35)
#37Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Gavin Flower (#35)
#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#36)
#39Josh Berkus
josh@agliodbs.com
In reply to: Jeff Davis (#27)
#40Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#39)
#41Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Kevin Grittner (#40)
#42Vlad Arkhipov
arhipov@dc.baikal.ru
In reply to: Gavin Flower (#41)
#43Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vlad Arkhipov (#42)
#44Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#40)
#45Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#40)
#46Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#37)
#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#46)
#48Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#22)
#49David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#49)
#51Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#50)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#51)
#53Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#43)
#54Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#53)
#55Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#50)
#56Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#54)
#57Chris Browne
cbbrowne@acm.org
In reply to: Jim Nasby (#56)
#58Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#56)