WIP: System Versioned Temporal Table
Hi all ,
Temporal table is one of the main new features added in sql standard 2011.
From that I will like to implement system versioned temporal table which
allows to keep past and present data so old data can be queried. Am propose
to implement it like below
CREATE
In create table only one table is create and both historical and current
data will be store in it. In order to make history and current data
co-exist row end time column will be added implicitly to primary key.
Regarding performance one can partition the table by row end time column
order to make history data didn't slowed performance.
INSERT
In insert row start time column and row end time column behave like a kind
of generated stored column except they store current transaction time and
highest value supported by the data type which is +infinity respectively.
DELETE and UPDATE
The old data is inserted with row end time column seated to current
transaction time
SELECT
If the query didn’t contain a filter condition that include system time
column, a filter condition will be added in early optimization that filter
history data.
Attached is WIP patch that implemented just the above and done on top of
commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet so one
can use regular filter condition for the time being
NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM TIME
rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and system
time is not selected unless explicitly asked
Any enlightenment?
regards
Surafel
Attachments:
WIP_system_version_temp_table.patchtext/x-patch; charset=US-ASCII; name=WIP_system_version_temp_table.patchDownload+568-20
On 23/10/2019 17:56, Surafel Temesgen wrote:
Hi all ,
Temporal table is one of the main new features added in sql standard
2011. From that I will like to implement system versioned temporal
table which allows to keep past and present data so old data can be
queried.
Excellent! I've been wanting this feature for a long time now. We're
the last major database to not have it.
I tried my hand at doing it in core, but ended up having better success
at an extension: https://github.com/xocolatl/periods/
Am propose to implement it like below
CREATE
In create table only one table is create and both historical and
current data will be store in it. In order to make history and current
data co-exist row end time column will be added implicitly to primary
key. Regarding performance one can partition the table by row end time
column order to make history data didn't slowed performance.
If we're going to be implicitly adding stuff to the PK, we also need to
add that stuff to the other unique constraints, no? And I think it
would be better to add both the start and the end column to these keys.
Most of the temporal queries will be accessing both.
INSERT
In insert row start time column and row end time column behave like a
kind of generated stored column except they store current transaction
time and highest value supported by the data type which is +infinity
respectively.
You're forcing these columns to be timestamp without time zone. If
you're going to force a datatype here, it should absolutely be timestamp
with time zone. However, I would like to see it handle both kinds of
timestamps as well as a simple date.
DELETE and UPDATE
The old data is inserted with row end time column seated to current
transaction time
I don't see any error handling for transaction anomalies. In READ
COMMITTED, you can easily end up with a case where the end time comes
before the start time. I don't even see anything constraining start
time to be strictly inferior to the end time. Such a constraint will be
necessary for application-time periods (which your patch doesn't address
at all but that's okay).
SELECT
If the query didn’t contain a filter condition that include system
time column, a filter condition will be added in early optimization
that filter history data.Attached is WIP patch that implemented just the above and done on top
of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet
so one can use regular filter condition for the time beingNOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM
TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and
system time is not selected unless explicitly asked
Why aren't you following the standard syntax here?
Any enlightenment?
There are quite a lot of typos and other things that aren't written "the
Postgres way". But before I comment on any of that, I'd like to see the
features be implemented correctly according to the SQL standard.
hi Vik,
On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
If we're going to be implicitly adding stuff to the PK, we also need to
add that stuff to the other unique constraints, no? And I think it
would be better to add both the start and the end column to these keys.
Most of the temporal queries will be accessing both.
yes it have to be added to other constraint too but adding both system time
to PK will violate constraint because it allow multiple data in current
data
Why aren't you following the standard syntax here?
because we do have TIME and SYSTEM_P as a key word and am not sure of
whether
its a right thing to add other keyword that contain those two word
concatenated
Any enlightenment?
There are quite a lot of typos and other things that aren't written "the
Postgres way". But before I comment on any of that, I'd like to see the
features be implemented correctly according to the SQL standard.
it is almost in sql standard syntax except the above small difference. i
can correct it
and post more complete patch soon.
regards
Surafel
On 24/10/2019 16:54, Surafel Temesgen wrote:
hi Vik,
On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:
If we're going to be implicitly adding stuff to the PK, we also
need to
add that stuff to the other unique constraints, no? And I think it
would be better to add both the start and the end column to these
keys.
Most of the temporal queries will be accessing both.
yes it have to be added to other constraint too but adding both system
time
to PK will violate constraint because it allow multiple data in
current data
I don't understand what you mean by this.
Why aren't you following the standard syntax here?
because we do have TIME and SYSTEM_P as a key word and am not sure of
whether
its a right thing to add other keyword that contain those two word
concatenated
Yes, we have to do that.
Any enlightenment?
There are quite a lot of typos and other things that aren't
written "the
Postgres way". But before I comment on any of that, I'd like to
see the
features be implemented correctly according to the SQL standard.it is almost in sql standard syntax except the above small difference.
i can correct it
and post more complete patch soon.
I don't mean just the SQL syntax, I also mean the behavior.
On Thu, Oct 24, 2019 at 6:49 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
On 24/10/2019 16:54, Surafel Temesgen wrote:
hi Vik,
On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>>wrote:
If we're going to be implicitly adding stuff to the PK, we also
need to
add that stuff to the other unique constraints, no? And I think it
would be better to add both the start and the end column to these
keys.
Most of the temporal queries will be accessing both.yes it have to be added to other constraint too but adding both system
time
to PK will violate constraint because it allow multiple data in
current dataI don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to allow
duplicate value to be inserted into a table with keeping constraint in
current data but it can be duplicated in history data. Adding row start
time column to primary key will eliminate this uniqueness for current data
which is not correct
regards
Surafel
On 25/10/2019 11:56, Surafel Temesgen wrote:
On Thu, Oct 24, 2019 at 6:49 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:On 24/10/2019 16:54, Surafel Temesgen wrote:
hi Vik,
On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing
<vik.fearing@2ndquadrant.com<mailto:vik.fearing@2ndquadrant.com>
<mailto:vik.fearing@2ndquadrant.com
<mailto:vik.fearing@2ndquadrant.com>>> wrote:
If we're going to be implicitly adding stuff to the PK, we also
need to
add that stuff to the other unique constraints, no? And Ithink it
would be better to add both the start and the end column to
these
keys.
Most of the temporal queries will be accessing both.
yes it have to be added to other constraint too but adding bothsystem
time
to PK will violate constraint because it allow multiple data in
current dataI don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to allow
duplicate value to be inserted into a table with keeping constraint in
current data but it can be duplicated in history data. Adding row
start time column to primary key will eliminate this uniqueness for
current data which is not correct
How? The primary/unique keys must always be unique at every point in time.
On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
I don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to allow
duplicate value to be inserted into a table with keeping constraint in
current data but it can be duplicated in history data. Adding row
start time column to primary key will eliminate this uniqueness for
current data which is not correctHow? The primary/unique keys must always be unique at every point in time.
From user prospect it is acceptable to delete and reinsert a record with
the same key value multiple time which means there will be multiple record
with the same key value in a history data but there is only one values in
current data as a table without system versioning do .I add row end time
column to primary key to allow user supplied primary key values to be
duplicated in history data which is acceptable
regards
Surafel
On 28/10/2019 13:48, Surafel Temesgen wrote:
On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:I don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to
allow
duplicate value to be inserted into a table with keeping
constraint in
current data but it can be duplicated in history data. Adding row
start time column to primary key will eliminate this uniqueness for
current data which is not correctHow? The primary/unique keys must always be unique at every point
in time.From user prospect it is acceptable to delete and reinsert a record
with the same key value multiple time which means there will be
multiple record with the same key value in a history data but there is
only one values in current data as a table without system versioning
do .I add row end time column to primary key to allow user supplied
primary key values to be duplicated in history data which is acceptable
Yes, I understand that. I'm saying you should also add the row start
time column.
Hi,
Attached is a complete patch and also contain a fix for your comments
regards
Surafel
Attachments:
0001-system-versioned-temporal-table.patchtext/x-patch; charset=US-ASCII; name=0001-system-versioned-temporal-table.patchDownload+1553-67
On 01/01/2020 11:50, Surafel Temesgen wrote:
Hi,
Attached is a complete patch and also contain a fix for your comments
This does not compile against current head (0ce38730ac).
gram.y: error: shift/reduce conflicts: 6 found, 0 expected
--
Vik Fearing
On Thu, Jan 2, 2020 at 12:12 AM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
This does not compile against current head (0ce38730ac).
gram.y: error: shift/reduce conflicts: 6 found, 0 expected
Rebased and conflict resolved i hope it build clean this time
regards
Surafel
Attachments:
0002-system-versioned-temporal-table.patchtext/x-patch; charset=US-ASCII; name=0002-system-versioned-temporal-table.patchDownload+1583-69
On 03/01/2020 11:57, Surafel Temesgen wrote:
On Thu, Jan 2, 2020 at 12:12 AM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:This does not compile against current head (0ce38730ac).
gram.y: error: shift/reduce conflicts: 6 found, 0 expected
Rebased and conflict resolved i hope it build clean this time
It does but you haven't included your tests file so `make check` fails.
It seems clear to me that you haven't tested it at all anyway. The
temporal conditions do not return the correct results, and the syntax is
wrong, too. Also, none of my previous comments have been addressed
except for "system versioning" instead of "system_versioning". Why?
--
Vik Fearing
On Fri, Jan 3, 2020 at 4:22 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
Rebased and conflict resolved i hope it build clean this time
It does but you haven't included your tests file so `make check` fails.
what tests file? i add system_versioned_table.sql and
system_versioned_table.out
test files and it tested and pass on appveyor[1]. https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.73247 only failed on travis
because of warning. i will add more test
It seems clear to me that you haven't tested it at all anyway. The
temporal conditions do not return the correct results, and the syntax is
wrong, too. Also, none of my previous comments have been addressed
except for "system versioning" instead of "system_versioning". Why?
I also correct typo and add row end column time to unique
key that make it unique for current data. As you mentioned
other comment is concerning about application-time periods
which the patch not addressing . i refer sql 2011 standard for
syntax can you tell me which syntax you find it wrong?
[1]: . https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.73247
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.73247
regards
Surafel
On Mon, Oct 28, 2019 at 6:36 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
On 28/10/2019 13:48, Surafel Temesgen wrote:
On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>>wrote:
I don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to
allow
duplicate value to be inserted into a table with keeping
constraint in
current data but it can be duplicated in history data. Adding row
start time column to primary key will eliminate this uniqueness for
current data which is not correctHow? The primary/unique keys must always be unique at every point
in time.From user prospect it is acceptable to delete and reinsert a record
with the same key value multiple time which means there will be
multiple record with the same key value in a history data but there is
only one values in current data as a table without system versioning
do .I add row end time column to primary key to allow user supplied
primary key values to be duplicated in history data which is acceptableYes, I understand that. I'm saying you should also add the row start
time column.
that allow the same primary key value row to be insert as long
as insertion time is different
regards
Surafel
On 05/01/2020 11:16, Surafel Temesgen wrote:
On Fri, Jan 3, 2020 at 4:22 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:Rebased and conflict resolved i hope it build clean this time
It does but you haven't included your tests file so `make check`
fails.what tests file?
Exactly.
i add system_versioned_table.sql and system_versioned_table.out
test files
Those are not included in the patch.
<checks again>
Okay, that was user error on my side. I apologize.
It seems clear to me that you haven't tested it at all anyway. The
temporal conditions do not return the correct results, and the
syntax is
wrong, too. Also, none of my previous comments have been addressed
except for "system versioning" instead of "system_versioning". Why?I also correct typo and add row end column time to unique
key that make it unique for current data. As you mentioned
other comment is concerning about application-time periods
which the patch not addressing .
- For performance, you must put the start column in the indexes also.
- You only handle timestamp when you should also handle timestamptz and
date.
- You don't throw 2201H for anomalies
i refer sql 2011 standard for
syntax can you tell me which syntax you find it wrong?
Okay, now that I see your tests, I understand why everything is broken.
You only test FROM-TO and with a really wide interval. There are no
tests for AS OF and no tests for BETWEEN-AND.
As for the syntax, you have:
select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;
when you should have:
select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;
That is, the FOR should be on the other side of the table name.
In addition, there are many rules in the standard that are not respected
here. For example, this query works and should not:
CREATE TABLE t (system_time integer) WITH SYSTEM VERSIONING;
This syntax is not supported:
ALTER TABLE t
ADD PERIOD FOR SYSTEM_TIME (s, e)
ADD COLUMN s timestamp
ADD COLUMN e timestamp;
psql's \d does not show that the table is system versioned, and doesn't
show the columns of the system_time period.
I can drop columns used in the period.
Please don't hesitate to take inspiration from my extension that does
this. The extension is under the PostgreSQL license for that reason.
Take from it whatever you need.
https://github.com/xocolatl/periods/
--
Vik Fearing
Vik Fearing-4 wrote
On 05/01/2020 11:16, Surafel Temesgen wrote:
On Fri, Jan 3, 2020 at 4:22 PM Vik Fearing
<
vik.fearing@
<mailto:
vik.fearing@
>> wrote:
[...]
You only test FROM-TO and with a really wide interval. There are no
tests for AS OF and no tests for BETWEEN-AND.As for the syntax, you have:
select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;when you should have:
select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;That is, the FOR should be on the other side of the table name.
[...]
Vik Fearing
Hello,
I though that standard syntax was "AS OF SYSTEM TIME"
as discussed here
/messages/by-id/A254CDC3-D308-4822-8928-8CC584E0CC71@elusive.cx
, also explaining how to parse such a syntax .
Regards
PAscal
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 05/01/2020 16:01, legrand legrand wrote:
As for the syntax, you have:
select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;when you should have:
select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to
'infinity' ORDER BY a;That is, the FOR should be on the other side of the table name.
[...]
Vik Fearing
Hello,
I though that standard syntax was "AS OF SYSTEM TIME"
as discussed here
/messages/by-id/A254CDC3-D308-4822-8928-8CC584E0CC71@elusive.cx
, also explaining how to parse such a syntax .
No, that is incorrect. The standard syntax is:
FROM tablename FOR SYSTEM_TIME AS OF '...'
FROM tablename FOR SYSTEM_TIME BETWEEN '...' AND '...'
FROM tablename FOR SYSTEM_TIME FROM '...' TO '...'
--
Vik Fearing
Vik Fearing-4 wrote
On 05/01/2020 16:01, legrand legrand wrote:
No, that is incorrect. The standard syntax is:
FROM tablename FOR SYSTEM_TIME AS OF '...'
FROM tablename FOR SYSTEM_TIME BETWEEN '...' AND '...'
FROM tablename FOR SYSTEM_TIME FROM '...' TO '...'
--
Vik Fearing
oups, I re-read links and docs and I'm wrong.
Sorry for the noise
Regards
PAscal
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hello.
Isn't this patch somehow broken?
At Mon, 28 Oct 2019 16:36:09 +0100, Vik Fearing <vik.fearing@2ndquadrant.com> wrote in
On 28/10/2019 13:48, Surafel Temesgen wrote:
On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing
<vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:I don't understand what you mean by this.
The primary purpose of adding row end time to primary key is to
allow
duplicate value to be inserted into a table with keeping
constraint in
current data but it can be duplicated in history data. Adding row
start time column to primary key will eliminate this uniqueness for
current data which is not correctHow? The primary/unique keys must always be unique at every point
in time.From user prospect it is acceptable to delete and reinsert a record
with the same key value multiple time which means there will be
multiple record with the same key value in a history data but there is
only one values in current data as a table without system versioning
do .I add row end time column to primary key to allow user supplied
primary key values to be duplicated in history data which is acceptableYes, I understand that. I'm saying you should also add the row start
time column.
I think that the start and end timestamps represent the period where
that version of the row was active. So UPDATE should modify the start
timestamp of the new version to the same value with the end timestamp
of the old version to the updated time. Thus, I don't think adding
start timestamp to PK doesn't work as expected. That hinders us from
rejecting rows with the same user-defined unique key because start
timestams is different each time of inserts. I think what Surafel is
doing in the current patch is correct. Only end_timestamp = +infinity
rejects another non-historical (= live) version with the same
user-defined unique key.
I'm not sure why the patch starts from "0002, but anyway it applied
on e369f37086. Then I ran make distclean, ./configure then make all,
make install, initdb and started server after all of them.
First, I tried to create a temporal table.
When I used timestamptz as the type of versioning columns, ALTER,
CREATE commands ended with server crash.
"CREATE TABLE t (a int, s timestamptz GENERATED ALWAYS AS ROW START, e timestamptz GENERATED ALWAYS AS ROW END);"
(CREATE TABLE t (a int);)
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START"
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamptz GENERATED ALWAYS AS ROW END"
If I added the start/end timestamp columns to an existing table, it
returns uncertain error.
ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START;
ERROR: column "s" contains null values
ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: column "s" contains null values
When I defined only start column, SELECT on the table crashed.
"CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START);"
"SELECT * from t;"
(crashed)
The following command ended with ERROR which I cannot understand the
cause, but I expected the command to be accepted.
ALTER TABLE t ADD COLUMN start timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN end timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: syntax error at or near "end"
I didin't examined further but the syntax part doesn't seem designed
well, and the body part seems vulnerable to unexpected input.
I ran a few queries:
SELECT * shows the timestamp columns, don't we need to hide the period
timestamp columns from this query?
I think UPDATE needs to update the start timestamp, but it doesn't. As
the result the timestamps doesn't represent the correct lifetime of
the row version and we wouldn't be able to pick up correct versions of
a row that exprerienced updates. (I didn't confirmed that because I
couldn't do "FOR SYSTEM_TIME AS OF" query due to syntax error..)
(Sorry in advance for possible pointless comments due to my lack of
access to the SQL11 standard.) If we have the period-timestamp
columns before the last two columns, INSERT in a common way on the
table fails, which doesn't seem to me to be expected behavior:
CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START, e timestamp(6) GENERATED ALWAYS AS ROW END, a int) WITH SYSTEM VERSIONING;
INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
ERROR: column "s" is of type timestamp without time zone but expression is of type integer
Some queries using SYSTEM_TIME which I think should be accepted ends
with error. Is the grammar part missing something?
SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00' AND '2020-01-08 0:00:00';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00'...
Other random comments (sorry for it not being comprehensive):
The patch at worst loops over all columns at every parse time. It is
quite ineffecient if there are many columns. We can store the column
indexes in relcache.
If I'm not missing anything, alter table doesn't properly modify
existing data in the target table. AddSystemVersioning should fill in
start/end_timestamp with proper values and DropSystemVersioning shuold
remove rows no longer useful.
+makeAndExpr(Node *lexpr, Node *rexpr, int location)
I believe that planner flattenes nested AND/ORs in
eval_const_expressions(). Shouldn't we leave the work to the planner?
+makeConstraint(ConstrType type)
We didn't use such a function to make that kind of nodes. Maybe we
should use makeNode directly, or we need to change similar coding into
that using the function. Addition to that, setting .location to 1 is
wrong. "Unknown" location is -1.
Separately from that, temporal clauses is not restriction of a
table. So it seems wrong to me to use constraint mechamism for this
purpose.
+makeSystemColumnDef(char *name)
"system column (or attribute)" is a column specially treated outside
of tuple descriptor. The temporal-period columns are not system
columns in that sense.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 05/01/2020 13:50, Vik Fearing wrote:
Okay, now that I see your tests, I understand why everything is broken.
You only test FROM-TO and with a really wide interval. There are no
tests for AS OF and no tests for BETWEEN-AND.
I have started working on some better test cases for you. The attached
.sql and .out tests should pass, and they are some of the tests that
I'll be putting your next version through. There are many more tests
that need to be added.
Once all the desired functionality is there, I'll start reviewing the
code itself.
Keep up the good work, and let me know if I can do anything to help you.
--
Vik Fearing