[PROPOSAL] Temporal query processing with range types

Started by Anton Dignösover 9 years ago67 messageshackers
Jump to latest
#1Anton Dignös
dignoes@inf.unibz.it

Hi hackers,

we are a group of researches that work on temporal databases. Our main focus is
the processing of data with time intervals, such as the range types in
PostgreSQL.

For this type of processing the range types that are stored with the tuples are
considered to represent the tuples' valid time, such as for instance the time
period an employee works for a project. Intuitively, the processing of temporal
operators corresponds to the use of "at each time point" in natural language.
For example, a temporal count on such a relation (temporal relation) shall
determine the count at each time point, while an anti-join shall compute an
anti-join at each time point.

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.

Query processing works as follows:
<temporal operator> := <NORMALIZE | ALIGN> + <traditional operator>

Attached is a proposal patch file that implements this extension and on which
the example queries below can be run. The support includes distinct,
aggregation, set operations, Cartesian product, Join, Left Join, Right Join,
Full Join, and Anti-join.

In our prototype the syntax for the two new operators is:

... FROM (table_ref NORMALIZE table_ref
USING(att_list)
WITH (columnref, columnref)
) alias_clause ...

and

... FROM (table_ref ALIGN table_ref
ON a_expr
WITH (columnref, columnref)
) alias_clause ...

where NORMALIZE is used for distinct, aggregation and set operations and ALIGN
is used for Cartesian product, Join, Left Join, Right Join, Full Join, and
Anti-join.

-------------------------------------------------------------------------------
EXAMPLE FOR AGGREGATION
-------------------------------------------------------------------------------

Relation 'projects' records when employees are assigned to projects.

CREATE TABLE projects (empl VARCHAR, proj VARCHAR, pay FLOAT, t DATERANGE);
INSERT INTO projects VALUES
('Ann', 'P1', 60, DATERANGE('2016-01-01', '2016-09-01')),
('Sam', 'P1', 40, DATERANGE('2016-06-01', '2016-12-01')),
('Joe', 'P2', 80, DATERANGE('2016-03-01', '2016-06-01'));

TABLE projects;

empl | proj | pay | t
------+------+-----+-------------------------
Ann | P1 | 60 | [2016-01-01,2016-09-01)
Sam | P1 | 40 | [2016-06-01,2016-12-01)
Joe | P2 | 80 | [2016-03-01,2016-06-01)
(3 rows)

QUERY1: At each time point, what is the number of employees assigned
to projects?

First, we use NORMALIZE to adjust the ranges so that they can be
aggregated as usual by using grouping on the adjusted timestamp t:

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted;

empl | proj | pay | t
------+------+-----+-------------------------
Ann | P1 | 60 | [2016-01-01,2016-03-01)
Ann | P1 | 60 | [2016-03-01,2016-06-01)
Ann | P1 | 60 | [2016-06-01,2016-09-01)
Sam | P1 | 40 | [2016-06-01,2016-09-01)
Sam | P1 | 40 | [2016-09-01,2016-12-01)
Joe | P2 | 80 | [2016-03-01,2016-06-01)
(6 rows)

Observe that the time ranges have been adjusted, and it is now trivial
to compute the count of employees at each time point by grouping on t:

SELECT COUNT(*), t
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted
GROUP BY t;

count | t
-------+-------------------------
1 | [2016-01-01,2016-03-01)
2 | [2016-03-01,2016-06-01)
2 | [2016-06-01,2016-09-01)
1 | [2016-09-01,2016-12-01)
(4 rows)

QUERY2: At each time point, what is the number of employees assigned
to EACH project?

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted;

empl | proj | pay | t
------+------+-----+-------------------------
Ann | P1 | 60 | [2016-01-01,2016-06-01)
Ann | P1 | 60 | [2016-06-01,2016-09-01)
Sam | P1 | 40 | [2016-06-01,2016-09-01)
Sam | P1 | 40 | [2016-09-01,2016-12-01)
Joe | P2 | 80 | [2016-03-01,2016-06-01)
(5 rows)

and

SELECT COUNT(*), proj, t
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted
GROUP BY proj, t;

count | proj | t
-------+------+-------------------------
1 | P2 | [2016-03-01,2016-06-01)
1 | P1 | [2016-01-01,2016-06-01)
2 | P1 | [2016-06-01,2016-09-01)
1 | P1 | [2016-09-01,2016-12-01)
(4 rows)

-------------------------------------------------------------------------------
EXAMPLE FOR ANTI-JOIN
-------------------------------------------------------------------------------

QUERY3: At each time point, deterime the employees with the highest
salary.

Without time this query is answered with a NOT EXISTS subquery. With
time periods everything remains, but the time ranges must be adjusted
first:

SELECT *
FROM (projects p1 ALIGN projects p2 ON p1.pay < p2.pay WITH (t,t)) p1_adjusted
WHERE NOT EXISTS (
SELECT *
FROM (projects p2 ALIGN projects p1 ON p1.pay < p2.pay WITH (t,t)) p2_adjusted
WHERE p1_adjusted.pay < p2_adjusted.pay
AND p1_adjusted.t = p2_adjusted.t );

empl | proj | pay | t
------+------+-----+-------------------------
Ann | P1 | 60 | [2016-01-01,2016-03-01)
Ann | P1 | 60 | [2016-06-01,2016-09-01)
Sam | P1 | 40 | [2016-09-01,2016-12-01)
Joe | P2 | 80 | [2016-03-01,2016-06-01)
(4 rows)

-------------------------------------------------------------------------------
EXAMPLE FOR JOINS
-------------------------------------------------------------------------------

CREATE TABLE managers (mgr VARCHAR, proj VARCHAR, t DATERANGE);
INSERT INTO managers VALUES
('Tom', 'P1', DATERANGE('2016-09-01', '2016-12-01')),
('Frank', 'P2', DATERANGE('2016-01-01', '2016-12-01'));

TABLE managers;

mgr | proj | t
-------+------+-------------------------
Tom | P1 | [2016-09-01,2016-12-01)
Frank | P2 | [2016-01-01,2016-12-01)
(2 rows)

QUERY4: At each time point, determine employees and their managers
(including the periods with no employee or no manager).

WITH p_ext AS (SELECT *, t u FROM projects),
m_ext AS (SELECT *, t v FROM managers)
SELECT proj, empl, mgr, t
FROM (p_ext ALIGN m_ext ON m_ext.proj = p_ext.proj WITH (t,t)) p_adjusted
FULL OUTER JOIN
(m_ext ALIGN p_ext ON m_ext.proj = p_ext.proj WITH (t,t)) m_adjusted
USING (proj, t)
WHERE t = u * v OR v IS NULL OR u IS NULL;

proj | empl | mgr | t
------+------+-------+-------------------------
P1 | Ann | | [2016-01-01,2016-09-01)
P1 | Sam | | [2016-06-01,2016-09-01)
P1 | Sam | Tom | [2016-09-01,2016-12-01)
P2 | | Frank | [2016-01-01,2016-03-01)
P2 | Joe | Frank | [2016-03-01,2016-06-01)
P2 | | Frank | [2016-06-01,2016-12-01)
(6 rows)

-------------------------------------------------------------------------------
IMPLEMENTATION
-------------------------------------------------------------------------------

Our implementation approach is to use two operators that take care of
adjusting the ranges in such a way that afterwards the corresponding
nontemporal operator can be used to compute the result:
input --> {NORMALIZE/ALIGN} --> nontemporal operator --> result.

The two operators (NORMALIZE and ALIGN) in this prototype are rewritten into
subqueries, which then serve as input for a new executor function
ExecTemporalAdjustment. The executor function takes care of the splitting of
the range types.

For NORMALIZE the tuples' ranges need to be split into all sub-ranges
according to all matching ranges of the second relation. For this we
create a subquery that first joins one relation with the range
boundaries of the other and then sorts the result. The executor
function splits the ranges in a sweep-line based manner.

For ALIGN the tuples' ranges must be split into all intersections and
differences with the other relation according to the join condition.
For this we create a subquery that first joins the two relations and
then sorts the result. The executor function splits the ranges
accordingly in a sweep-line based manner.

After the splitting it is possible to only use equality (GROUP BY,
DISTINCT, JOIN CONDITION, ...) on the range types to get the intended
result.

The subquery is 'visible' with the EXPLAIN command.

Range types need to be of half-open, i.e., [lower, upper).

Unbound ranges (Infinity), NaN, and NULL values (of ranges and range
bounds) are not yet supported.

-------------------------------------------------------------------------------

We are grateful for any feedback.

Best regards,

Anton, Johann, Michael, Peter

Attachments:

tpg_primitives_out_v1.patchapplication/octet-stream; name=tpg_primitives_out_v1.patchDownload+2963-11
#2David Fetter
david@fetter.org
In reply to: Anton Dignös (#1)
Re: [PROPOSAL] Temporal query processing with range types

On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dign�s wrote:

Hi hackers,

we are a group of researches that work on temporal databases. Our
main focus is the processing of data with time intervals, such as
the range types in PostgreSQL.

Thanks for your hard work so far!

[Explanation and examples elided]

To what extent, if any, are you attempting to follow the SQL:2011
standard?

http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Anton Dignös
dignoes@inf.unibz.it
In reply to: David Fetter (#2)
Re: [PROPOSAL] Temporal query processing with range types

On Sat, Jul 23, 2016 at 12:01 AM, David Fetter <david@fetter.org> wrote:

On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dignös wrote:

Hi hackers,

we are a group of researches that work on temporal databases. Our
main focus is the processing of data with time intervals, such as
the range types in PostgreSQL.

Thanks for your hard work so far!

[Explanation and examples elided]

To what extent, if any, are you attempting to follow the SQL:2011
standard?

http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

The querying in the SQL:2011 standard is based on simple SQL range restrictions
and period predicates (OVERLAP, PRECEDES, FOR SYSTEM_TIME AS OF, etc) that
functionality-wise in PostgreSQL are already covered by the operators and
functions on range types.

Operations such as aggregation, outer joins, set-operations on ranges
(mentioned in
Section 2.5 "Future directions" in the above paper) are not yet part of the
standard. These are the operations that require the adjustment (or splitting) of
ranges.

Best,

Anton

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Anton Dignös (#1)
Re: [PROPOSAL] Temporal query processing with range types

On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.

I think that it is great that you want to contribute your work to
PostgreSQL. I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow. Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you. If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Moser
pitiz29a@gmail.com
In reply to: Robert Haas (#4)
Re: [PROPOSAL] Temporal query processing with range types

On 27.07.2016 at 16:09 Robert Haas wrote:

On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <dignoes@inf.unibz.it> wrote:

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.

I think that it is great that you want to contribute your work to
PostgreSQL. I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow. Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you.

Hi hackers,

thank you for your feedback.

We are aware that contributing to PostgreSQL is a long way with a lot
of work. We are committed to go all the way and do the work as
discussed in the community.

We had some internal discussions about the project, looking also at
some other patches to better understand whether the patch is
work-in-progress or ready for commitfest.

If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

We decided to follow your recommendation and add the patch to the
commitfest.

Looking forward for your feedback,
Anton, Johann, Michael, Peter

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Peter Moser (#5)
Re: [PROPOSAL] Temporal query processing with range types

On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com> wrote:

We decided to follow your recommendation and add the patch to the
commitfest.

Path is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia

#7Peter Moser
pitiz29a@gmail.com
In reply to: Haribabu Kommi (#6)
Re: [PROPOSAL] Temporal query processing with range types

Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:

On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
<mailto:pitiz29a@gmail.com>> wrote:

We decided to follow your recommendation and add the patch to the
commitfest.

Path is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.

We updated our patch. We tested it with the latest
commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

Regards,
Hari Babu
Fujitsu Australia

Best regards,
Anton, Johann, Michael, Peter

Attachments:

tpg_primitives_out_v2.patchtext/x-patch; name=tpg_primitives_out_v2.patchDownload+3050-11
#8David Fetter
david@fetter.org
In reply to: Peter Moser (#7)
Re: [PROPOSAL] Temporal query processing with range types

On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:

Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:

On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
<mailto:pitiz29a@gmail.com>> wrote:

We decided to follow your recommendation and add the patch to the
commitfest.

Path is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.

We updated our patch. We tested it with the latest
commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

This looks neat, but it no longer applies to master. Is a rebase in
the offing?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Peter Moser
pitiz29a@gmail.com
In reply to: David Fetter (#8)
Re: [PROPOSAL] Temporal query processing with range types

Am 16.12.2016 um 07:17 schrieb David Fetter:

On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:

Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:

On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <pitiz29a@gmail.com
<mailto:pitiz29a@gmail.com>> wrote:

We decided to follow your recommendation and add the patch to the
commitfest.

Path is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.

We updated our patch. We tested it with the latest
commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

This looks neat, but it no longer applies to master. Is a rebase in
the offing?

We rebased our patch on top of HEAD, that is, commit
93513d1b6559b2d0805f0b02d312ee550e3d010b.

Best regards,
Anton, Johann, Michael, Peter

Attachments:

tpg_primitives_out_v3.patchtext/x-patch; name=tpg_primitives_out_v3.patchDownload+3069-11
#10Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Moser (#9)
Re: [PROPOSAL] Temporal query processing with range types

So I'm looking at this patch in the commit fest. I have only a general
understanding of temporal query processing.

What this patch does is to add two new clauses for FROM-list items,
NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
that can then be aggregated more easily. From the original message:

For NORMALIZE the tuples' ranges need to be split into all sub-ranges
according to all matching ranges of the second relation. For this we
create a subquery that first joins one relation with the range
boundaries of the other and then sorts the result. The executor
function splits the ranges in a sweep-line based manner.

For ALIGN the tuples' ranges must be split into all intersections and
differences with the other relation according to the join condition.
For this we create a subquery that first joins the two relations and
then sorts the result. The executor function splits the ranges
accordingly in a sweep-line based manner.

So there isn't really temporal query processing as such here, only some
helpers that can make it easier.

I can see how those operations can be useful, but it would help if there
were a more formal definition to be able to check that further.

What I'm missing here is some references: existing implementations,
standards, documentation, research papers, alternative ideas, rejected
alternatives, etc.

Also, the submission is missing documentation and test cases. There are
technical terms used in the code that I don't understand.

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges. I'd like to see an overview and
consideration of other applications.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function. I think we'd need a
clearer definition of what it is they do before we can evaluate that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Moser
pitiz29a@gmail.com
In reply to: Peter Eisentraut (#10)
Re: [PROPOSAL] Temporal query processing with range types

What this patch does is to add two new clauses for FROM-list items,
NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
that can then be aggregated more easily. From the original message:

For NORMALIZE the tuples' ranges need to be split into all sub-ranges
according to all matching ranges of the second relation. For this we
create a subquery that first joins one relation with the range
boundaries of the other and then sorts the result. The executor
function splits the ranges in a sweep-line based manner.

For ALIGN the tuples' ranges must be split into all intersections and
differences with the other relation according to the join condition.
For this we create a subquery that first joins the two relations and
then sorts the result. The executor function splits the ranges
accordingly in a sweep-line based manner.

So there isn't really temporal query processing as such here, only some
helpers that can make it easier.

The goal of temporal aligners and normalizers is to split ranges to allow a
reduction from temporal queries to their non-temporal counterparts.
Splitting
ranges is necessary for temporal query processing. Temporal aligners and
normalizer may then be used as building-blocks for any temporal query
construct.

I can see how those operations can be useful, but it would help if there
were a more formal definition to be able to check that further.

We have published two papers, that contain formal definitions and related
work
for the temporal aligner and normalizer. Please see [1]Anton Dignös, Michael H. Böhlen, Johann Gamper: Temporal alignment. SIGMOD Conference 2012: 433-444 http://doi.acm.org/10.1145/2213836.2213886 and [2]Anton Dignös, Michael H. Böhlen, Johann Gamper, Christian S. Jensen: Extending the Kernel of a Relational DBMS with Comprehensive Support for Sequenced Temporal Queries. ACM Trans. Database Syst. 41(4): 26:1-26:46 (2016) http://doi.acm.org/10.1145/2967608.

What I'm missing here is some references: existing implementations,
standards, documentation, research papers, alternative ideas, rejected
alternatives, etc.

A good overview of existing implementations in DBMSs, SQL standard, and
history
is given in [3]https://www2.cs.arizona.edu/people/rts/sql3.html and https://www2.cs.arizona.edu/people/rts/tsql2.html.

Also, the submission is missing documentation and test cases. There are
technical terms used in the code that I don't understand.

We added a second patch with test cases and expected results. We are now
writing the documentation in sgml-format.

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges. I'd like to see an overview and
consideration of other applications.

Please see the attached file adjustment.sql for some interesting
applications.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function. I think we'd need a
clearer definition of what it is they do before we can evaluate that.

Can you please explain what you mean by "user-space construct" in this case.

Best regards,
Anton, Johann, Michael, Peter

----
[1]: Anton Dignös, Michael H. Böhlen, Johann Gamper: Temporal alignment. SIGMOD Conference 2012: 433-444 http://doi.acm.org/10.1145/2213836.2213886
Temporal alignment. SIGMOD Conference 2012: 433-444
http://doi.acm.org/10.1145/2213836.2213886
[2]: Anton Dignös, Michael H. Böhlen, Johann Gamper, Christian S. Jensen: Extending the Kernel of a Relational DBMS with Comprehensive Support for Sequenced Temporal Queries. ACM Trans. Database Syst. 41(4): 26:1-26:46 (2016) http://doi.acm.org/10.1145/2967608
Extending the Kernel of a Relational DBMS with Comprehensive Support for
Sequenced Temporal Queries. ACM Trans. Database Syst. 41(4): 26:1-26:46
(2016)
http://doi.acm.org/10.1145/2967608
[3]: https://www2.cs.arizona.edu/people/rts/sql3.html and https://www2.cs.arizona.edu/people/rts/tsql2.html
https://www2.cs.arizona.edu/people/rts/tsql2.html

Attachments:

adjustment.sqlapplication/sql; name=adjustment.sqlDownload+0-110
tpg_primitives_out_tests_v1.patchtext/x-patch; charset=US-ASCII; name=tpg_primitives_out_tests_v1.patchDownload+1298-1
#12Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Moser (#11)
Re: [PROPOSAL] Temporal query processing with range types

On 1/13/17 9:22 AM, Peter Moser wrote:

The goal of temporal aligners and normalizers is to split ranges to allow a
reduction from temporal queries to their non-temporal counterparts.
Splitting
ranges is necessary for temporal query processing. Temporal aligners and
normalizer may then be used as building-blocks for any temporal query
construct.

I would need to see the exact definitions of these constructs. Please
send some documentation.

We have published two papers, that contain formal definitions and
related work
for the temporal aligner and normalizer. Please see [1] and [2].

I don't have access to those.

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges. I'd like to see an overview and
consideration of other applications.

Please see the attached file adjustment.sql for some interesting
applications.

That's surely interesting, but without knowing what these operations are
supposed to do, I can only reverse engineer and guess.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function. I think we'd need a
clearer definition of what it is they do before we can evaluate that.

Can you please explain what you mean by "user-space construct" in this case.

Implement them using the extensibility features, such as a user-defined
operator. I don't know if it's possible, but it's something to consider.

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Peter Moser
pitiz29a@gmail.com
In reply to: Peter Eisentraut (#12)
Re: [PROPOSAL] Temporal query processing with range types

2017-01-18 3:57 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:

On 1/13/17 9:22 AM, Peter Moser wrote:

The goal of temporal aligners and normalizers is to split ranges to allow a
reduction from temporal queries to their non-temporal counterparts.
Splitting
ranges is necessary for temporal query processing. Temporal aligners and
normalizer may then be used as building-blocks for any temporal query
construct.

I would need to see the exact definitions of these constructs. Please
send some documentation.

We have published two papers, that contain formal definitions and
related work
for the temporal aligner and normalizer. Please see [1] and [2].

I don't have access to those.

The papers can be freely downloaded from
http://www.inf.unibz.it/~dignoes/publications.html using the "Author-ize link".

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges. I'd like to see an overview and
consideration of other applications.

Please see the attached file adjustment.sql for some interesting
applications.

That's surely interesting, but without knowing what these operations are
supposed to do, I can only reverse engineer and guess.

Intuitively what they do is as follows:

NORMALIZE: splits all the ranges of one relation according to all the range
boundaries of another (but possibly the same) relation whenever some equality
condition over some given attributes is satisfied.

When the two relations are the same, all ranges with the given equal attributes
are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
the grouping attributes and for a (temporal) DISTINCT the target list
attributes.

When the two relations are different, but they each contain disjoint ranges
for the same attributes (as the current limitation for the set operations is)
we perform a symmetric NORMALIZE on each of them. Then we have a similar
situation as before, i.e., in both relations ranges with the same attributes
are either equal or disjoint and a traditional set operation
(EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.

ALIGN: splits all the ranges of one relation according to all the range
intersections of another relation, i.e., it produces all intersections and
non-overlapping parts, whenever some condition is satisfied.

We perform a symmetric ALIGN on each relation, after which a traditional inner
or outer join can be applied using equality on the ranges to calculate the
overlap. The condition given to a (temporal) inner or outer join is the
join condition without overlap.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function. I think we'd need a
clearer definition of what it is they do before we can evaluate that.

Can you please explain what you mean by "user-space construct" in this case.

Implement them using the extensibility features, such as a user-defined
operator. I don't know if it's possible, but it's something to consider.

We experimented with user-defined operators and set-returning functions. The
main issue with these is that ALIGN and NORMALIZE rely on comparison and
processing of one tuple with many tuples at a time that is not possible with
operators, and for set-returning functions there is no clean way of passing
tables and/or conditions.

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
options? We are also thankful for any suggestion or comments about the syntax.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Michael Paquier
michael@paquier.xyz
In reply to: Peter Moser (#13)
Re: [PROPOSAL] Temporal query processing with range types

On Tue, Jan 24, 2017 at 6:32 PM, Peter Moser <pitiz29a@gmail.com> wrote:

[reviews and discussions]

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Peter Moser
pitiz29a@gmail.com
In reply to: Michael Paquier (#14)
Re: [PROPOSAL] Temporal query processing with range types

2017-02-01 6:19 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.

We have rebased our patches on top of commit
53dd2da257fb5904b087b97dd9c2867390d309c1
from "Thu Feb 2 14:12:35 2017 +0200".

Hereby, we used the following commands to create both patches:
git diff --no-prefix origin/master -- src/ ':!src/test/*' >
tpg_primitives_out_v4.patch

git diff --no-prefix origin/master -- src/test/ >
tpg_primitives_out_tests_v2.patch

We have also tested our patches on the current HEAD with the command:
patch -p0 < patch-file

Both worked without problems or warnings on our Linux machine.
Could you please explain, which problems occurred while you tried to
apply our patches?

Best regards,
Anton, Johann, Michael, Peter

Attachments:

tpg_primitives_out_v4.patchtext/x-patch; charset=US-ASCII; name=tpg_primitives_out_v4.patchDownload+3078-21
tpg_primitives_out_tests_v2.patchtext/x-patch; charset=US-ASCII; name=tpg_primitives_out_tests_v2.patchDownload+1298-1
#16Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Moser (#15)
Re: [PROPOSAL] Temporal query processing with range types

On 2/2/17 12:43 PM, Peter Moser wrote:

Hereby, we used the following commands to create both patches:
git diff --no-prefix origin/master -- src/ ':!src/test/*' >
tpg_primitives_out_v4.patch

git diff --no-prefix origin/master -- src/test/ >
tpg_primitives_out_tests_v2.patch

We have also tested our patches on the current HEAD with the command:
patch -p0 < patch-file

Both worked without problems or warnings on our Linux machine.
Could you please explain, which problems occurred while you tried to
apply our patches?

Your patches apply OK for me.

In the future, just use git diff without any options or git
format-patch, and put both the code and the tests in one patch.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Peter Moser (#13)
Re: [PROPOSAL] Temporal query processing with range types

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:

NORMALIZE: splits all the ranges of one relation according to all the range
boundaries of another (but possibly the same) relation whenever some equality
condition over some given attributes is satisfied.

When the two relations are the same, all ranges with the given equal attributes
are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
the grouping attributes and for a (temporal) DISTINCT the target list
attributes.

When the two relations are different, but they each contain disjoint ranges
for the same attributes (as the current limitation for the set operations is)
we perform a symmetric NORMALIZE on each of them. Then we have a similar
situation as before, i.e., in both relations ranges with the same attributes
are either equal or disjoint and a traditional set operation
(EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.

ALIGN: splits all the ranges of one relation according to all the range
intersections of another relation, i.e., it produces all intersections and
non-overlapping parts, whenever some condition is satisfied.

We perform a symmetric ALIGN on each relation, after which a traditional inner
or outer join can be applied using equality on the ranges to calculate the
overlap. The condition given to a (temporal) inner or outer join is the
join condition without overlap.

I don't quite understand the difference between NORMALIZE and ALIGN.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Peter Moser (#13)
Re: [PROPOSAL] Temporal query processing with range types

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
options? We are also thankful for any suggestion or comments about the syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins. Is that
right? The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

What happens if you perform the ALIGN or NORMALIZE operation using
something other than an equality operator, like, say, less-than? Or
an arbitrary user-defined operator.

There's no documentation in this patch. I'm not sure you want to go
to the trouble of writing SGML documentation until this has been
reviewed enough that it has a real chance of getting committed, but on
the other hand we're obviously all struggling to understand what it
does, so I think if not SGML documentation it at least needs a real
clear explanation of what the syntax is and does in a README or
something, even just for initial review.

We don't have anything quite like this in PostgreSQL today. An
ordinary join just matches up things in relation A and relation B and
outputs the matching rows, and something like a SRF takes a set of
input rows and returns a set of output rows. This is a hybrid - it
takes in two sets of rows, one from each relation being joined, and
produces a derived set of output rows that takes into account both
inputs.

+ * INPUT:
+ *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ *             where q can be any join qualifier, and r.ts, r.te, s.ts, and s.t
e
+ *             can be any column name.
+ *
+ * OUTPUT:
+ *             (
+ *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ *      FROM
+ *      (
+ *             SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.ts < s.te AND r.te > s.ts
+ *      ORDER BY rn, P1, P2
+ *      ) c

It's hard to see what's going on here. What's ts? What's te? If you
used longer names for these things, it might be a bit more
self-documenting.

If we are going to transform an ALIGN operator in to a left outer
join, why do we also have an executor node for it?

+               fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);
+               fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);

Why is a temporal operator calling functions that upper-case and
lower-case strings? In one sense this whole function (and much of the
nearby code) is very straightforward code and you can see exactly why
it's doing it. In another sense it's totally inscrutable: WHY is it
doing any of that stuff?

-       char       *strategy;           /* partitioning strategy
('list' or 'range') */
-       List       *partParams;         /* List of PartitionElems */
-       int                     location;               /* token
location, or -1 if unknown */
+       char       *strategy;   /* partitioning strategy ('list' or 'range') */
+       List       *partParams; /* List of PartitionElems */
+       int                     location;       /* token location, or
-1 if unknown */

I think this is some kind of mistake on your end while generating the
patch. It looks like you patched one version of the source code, and
diffed against another.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#18)
Re: [PROPOSAL] Temporal query processing with range types

On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a@gmail.com> wrote:

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be

better

options? We are also thankful for any suggestion or comments about the

syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins. Is that
right? The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

+ * INPUT:
+ *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ *             where q can be any join qualifier, and r.ts, r.te, s.ts,
and s.t
e
+ *             can be any column name.
+ *
+ * OUTPUT:
+ *             (
+ *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ *      FROM
+ *      (
+ *             SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.ts < s.te AND r.te > s.ts
+ *      ORDER BY rn, P1, P2
+ *      ) c

It's hard to see what's going on here. What's ts? What's te? If you
used longer names for these things, it might be a bit more
self-documenting.

Just reasoning out loud here...​

ISTM ts and te are "temporal [range] start" and "temporal [range] end"​ (or
probably just the common "timestamp start/end")

​From what I can see it is affecting an intersection of the two ranges and,
furthermore, splitting the LEFT range into sub-ranges that match up with
the sub-ranges found on the right side. From the example above this seems
like it should be acting on self-normalized ranges - but I may be missing
something by evaluating this out of context.

r1 [1, 6] [ts, te] [time period start, time period end]
s1 [2, 3]
s2 [3, 4]
s3 [5, 7]

r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)

r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]

Thus the intersection is [2,6] but since s1 has three ranges that begin
between 2 and 6 (i.e., 2, 3, and 5) there are three output records that
correspond to those sub-ranges.

The description in the OP basically distinguishes between NORMALIZE and
ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
ranges - discarding the non-overlapping data - while NORMALIZE performs the
alignment while also retaining the non-overlapping data.

The rest of the syntax seems to deal with selecting subsets of range
records based upon attribute data.

David J.

#20Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#19)
Re: [PROPOSAL] Temporal query processing with range types

On Wed, Feb 15, 2017 at 3:33 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

The description in the OP basically distinguishes between NORMALIZE and
ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
ranges - discarding the non-overlapping data - while NORMALIZE performs the
alignment while also retaining the non-overlapping data.

Hmm. So is ALIGN like an inner join and NORMALIZE like a full outer
join? Couldn't you have left and right outer joins, too, like you
null-extend the parts of the lefthand range that don't match but throw
away parts of the righthand range that don't match?

Also, it sounds like all of this is intended to work with ranges that
are stored in different columns rather than with PostgreSQL's built-in
range types.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Peter Moser
pitiz29a@gmail.com
In reply to: David G. Johnston (#19)
#22Peter Moser
pitiz29a@gmail.com
In reply to: Robert Haas (#20)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#20)
#24Peter Moser
pitiz29a@gmail.com
In reply to: Peter Eisentraut (#23)
#25Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Moser (#24)
#26Peter Moser
pitiz29a@gmail.com
In reply to: Jim Nasby (#25)
#27Peter Moser
pitiz29a@gmail.com
In reply to: Peter Moser (#26)
#28Peter Moser
pitiz29a@gmail.com
In reply to: Peter Moser (#27)
#29Peter Moser
pitiz29a@gmail.com
In reply to: Peter Moser (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Peter Moser (#29)
#31Peter Moser
pitiz29a@gmail.com
In reply to: Andres Freund (#30)
#32Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Moser (#31)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Moser (#29)
#34Peter Moser
pitiz29a@gmail.com
In reply to: Thomas Munro (#32)
#35Peter Moser
pitiz29a@gmail.com
In reply to: Simon Riggs (#33)
#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Moser (#35)
#37Peter Moser
pitiz29a@gmail.com
In reply to: Pavel Stehule (#36)
#38Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Moser (#37)
#39Peter Moser
pitiz29a@gmail.com
In reply to: Pavel Stehule (#38)
#40Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Anton Dignös (#1)
#41Mike Rylander
mrylander@gmail.com
In reply to: Paul Jungwirth (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Mike Rylander (#41)
#43Peter Moser
peter.moser@unibz.it
In reply to: Paul Jungwirth (#40)
#44Peter Moser
peter.moser@unibz.it
In reply to: Robert Haas (#42)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Moser (#44)
#46Peter Moser
peter.moser@unibz.it
In reply to: Tom Lane (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Peter Moser (#46)
#48Peter Moser
pitiz29a@gmail.com
In reply to: Robert Haas (#47)
#49Michael Paquier
michael@paquier.xyz
In reply to: Peter Moser (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Peter Moser (#48)
#51Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#50)
#52Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Moser (#29)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#51)
#54Peter Moser
pitiz29a@gmail.com
In reply to: Robert Haas (#50)
#55Peter Moser
pitiz29a@gmail.com
In reply to: Simon Riggs (#51)
#56Peter Moser
pitiz29a@gmail.com
In reply to: Simon Riggs (#52)
#57Peter Moser
pitiz29a@gmail.com
In reply to: Robert Haas (#53)
#58Peter Moser
pitiz29a@gmail.com
In reply to: Peter Moser (#54)
#59Peter Moser
pitiz29a@gmail.com
In reply to: Peter Moser (#58)
#60David Steele
david@pgmasters.net
In reply to: Peter Moser (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#60)
#62Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Robert Haas (#61)
#63Thomas Munro
thomas.munro@gmail.com
In reply to: Ibrar Ahmed (#62)
#64Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Thomas Munro (#63)
#65Peter Moser
pitiz29a@gmail.com
In reply to: Ibrar Ahmed (#64)
#66Michael Paquier
michael@paquier.xyz
In reply to: Peter Moser (#65)
#67Peter Moser
pitiz29a@gmail.com
In reply to: Michael Paquier (#66)