Row pattern recognition
Attached is a PoC patch to implement "Row pattern recognition" (RPR)
in SQL:2016 (I know SQL:2023 is already out, but I don't have access
to it). Actually SQL:2016 defines RPR in two places[1]https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql:
Feature R010, “Row pattern recognition: FROM clause”
Feature R020, “Row pattern recognition: WINDOW clause”
The patch includes partial support for R020 part.
- What is RPR?
RPR provides a way to search series of data using regular expression
patterns. Suppose you have a stock database.
CREATE TABLE stock (
company TEXT,
tdate DATE,
price BIGINT);
You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
more days, then goes up for 1 or more days. If we try to implement the
query in PostgreSQL, it could be quite complex and inefficient.
RPR provides convenient way to implement the query.
SELECT company, tdate, price, rpr(price) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
"PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
"row pattern variables" namely START, UP and DOWN. They are associated
with logical expressions namely "TRUE", "price > PREV(price)", and
"price < PREV(price)". Note that "PREV" function returns price column
in the previous row. So, UP is true if price is higher than previous
day. On the other hand, DOWN is true if price is lower than previous
day. PATTERN uses the row pattern variables to create a necessary
pattern. In this case, the first row is always match because START is
always true, and second or more rows match with "UP" ('+' is a regular
expression representing one or more), and subsequent rows match with
"DOWN".
Here is the sample output.
company | tdate | price | rpr
----------+------------+-------+------
company1 | 2023-07-01 | 100 |
company1 | 2023-07-02 | 200 | 200 -- 200->150->140->150
company1 | 2023-07-03 | 150 | 150 -- 150->140->150
company1 | 2023-07-04 | 140 |
company1 | 2023-07-05 | 150 | 150 -- 150->90->110->130
company1 | 2023-07-06 | 90 |
company1 | 2023-07-07 | 110 |
company1 | 2023-07-08 | 130 |
company1 | 2023-07-09 | 120 |
company1 | 2023-07-10 | 130 |
rpr shows the first row if all the patterns are satisfied. In the
example above 200, 150, 150 are the cases. Other rows are shown as
NULL. For example, on 2023-07-02 price starts with 200, then goes down
to 150 then 140 but goes up 150 next day.
As far as I know, only Oracle implements RPR (only R010. R020 is not
implemented) among OSS/commercial RDBMSs. There are a few DWH software
having RPR. According to [2]https://link.springer.com/article/10.1007/s13222-022-00404-3 they are Snowflake and MS Stream
Analytics. It seems Trino is another one[3]https://trino.io/docs/current/sql/pattern-recognition-in-window.html -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp.
- Note about the patch
The current implementation is not only a subset of the standard, but
is different from it in some places. The example above is written as
follows according to the standard:
SELECT company, tdate, startprice OVER w FROM stock
WINDOW w AS (
PARTITION BY company
MEASURES
START.price AS startprice
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS UP.price > PREV(UP.price),
DOWN AS DOWN.price < PREV(DOWN.price)
);
Notice that rpr(price) is written as START.price and startprice in the
standard. MEASURES defines variable names used in the target list used
with "OVER window". As OVER only allows functions in PostgreSQL, I had
to make up a window function "rpr" which performs the row pattern
recognition task. I was not able to find a way to implement
expressions like START.price (START is not a table alias). Any
suggestion is greatly appreciated.
The differences from the standard include:
o MEASURES is not supported
o SUBSET is not supported
o FIRST, LAST, CLASSIFIER are not supported
o PREV/NEXT in the standard accept more complex arguments
o Regular expressions other than "+" are not supported
o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
o Aggregate functions associated with window clause using RPR do not respect RPR
It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.
Comments and suggestions are welcome.
[1]: https://sqlperformance.com/2019/04/t-sql-queries/row-pattern-recognition-in-sql
[2]: https://link.springer.com/article/10.1007/s13222-022-00404-3
[3]: https://trino.io/docs/current/sql/pattern-recognition-in-window.html -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachments:
v1-0001-Row-pattern-recognition-patch-for-raw-parser.patchtext/x-patch; charset=us-asciiDownload+266-16
v1-0002-Row-pattern-recognition-patch-parse-analysis.patchtext/x-patch; charset=us-asciiDownload+209-2
v1-0003-Row-pattern-recognition-patch-planner.patchtext/x-patch; charset=us-asciiDownload+19-4
v1-0004-Row-pattern-recognition-patch-executor.patchtext/x-patch; charset=us-asciiDownload+517-11
v1-0005-Row-pattern-recognition-patch-docs.patchtext/x-patch; charset=us-asciiDownload+136-3
v1-0006-Row-pattern-recognition-patch-tests.patchtext/x-patch; charset=us-asciiDownload+424-2
On 6/25/23 14:05, Tatsuo Ishii wrote:
Attached is a PoC patch to implement "Row pattern recognition" (RPR)
in SQL:2016 (I know SQL:2023 is already out, but I don't have access
to it). Actually SQL:2016 defines RPR in two places[1]:Feature R010, “Row pattern recognition: FROM clause”
Feature R020, “Row pattern recognition: WINDOW clause”The patch includes partial support for R020 part.
I have been dreaming of and lobbying for someone to pick up this
feature. I will be sure to review it from a standards perspective and
will try my best to help with the technical aspect, but I am not sure to
have the qualifications for that.
THANK YOU!
(I know SQL:2023 is already out, but I don't have access to it)
If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
of just its technical specification.
https://www.iso.org/standard/78936.html
- What is RPR?
RPR provides a way to search series of data using regular expression
patterns. Suppose you have a stock database.CREATE TABLE stock (
company TEXT,
tdate DATE,
price BIGINT);You want to find a "V-shaped" pattern: i.e. price goes down for 1 or
more days, then goes up for 1 or more days. If we try to implement the
query in PostgreSQL, it could be quite complex and inefficient.RPR provides convenient way to implement the query.
SELECT company, tdate, price, rpr(price) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);"PATTERN" and "DEFINE" are the key clauses of RPR. DEFINE defines 3
"row pattern variables" namely START, UP and DOWN. They are associated
with logical expressions namely "TRUE", "price > PREV(price)", and
"price < PREV(price)". Note that "PREV" function returns price column
in the previous row. So, UP is true if price is higher than previous
day. On the other hand, DOWN is true if price is lower than previous
day. PATTERN uses the row pattern variables to create a necessary
pattern. In this case, the first row is always match because START is
always true, and second or more rows match with "UP" ('+' is a regular
expression representing one or more), and subsequent rows match with
"DOWN".Here is the sample output.
company | tdate | price | rpr
----------+------------+-------+------
company1 | 2023-07-01 | 100 |
company1 | 2023-07-02 | 200 | 200 -- 200->150->140->150
company1 | 2023-07-03 | 150 | 150 -- 150->140->150
company1 | 2023-07-04 | 140 |
company1 | 2023-07-05 | 150 | 150 -- 150->90->110->130
company1 | 2023-07-06 | 90 |
company1 | 2023-07-07 | 110 |
company1 | 2023-07-08 | 130 |
company1 | 2023-07-09 | 120 |
company1 | 2023-07-10 | 130 |rpr shows the first row if all the patterns are satisfied. In the
example above 200, 150, 150 are the cases. Other rows are shown as
NULL. For example, on 2023-07-02 price starts with 200, then goes down
to 150 then 140 but goes up 150 next day.
I don't understand this. RPR in a window specification limits the
window to the matched rows, so this looks like your rpr() function is
just the regular first_value() window function that we already have?
As far as I know, only Oracle implements RPR (only R010. R020 is not
implemented) among OSS/commercial RDBMSs. There are a few DWH software
having RPR. According to [2] they are Snowflake and MS Stream
Analytics. It seems Trino is another one[3].
I thought DuckDB had it already, but it looks like I was wrong.
- Note about the patch
The current implementation is not only a subset of the standard, but
is different from it in some places. The example above is written as
follows according to the standard:SELECT company, tdate, startprice OVER w FROM stock
WINDOW w AS (
PARTITION BY company
MEASURES
START.price AS startprice
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS UP.price > PREV(UP.price),
DOWN AS DOWN.price < PREV(DOWN.price)
);Notice that rpr(price) is written as START.price and startprice in the
standard. MEASURES defines variable names used in the target list used
with "OVER window". As OVER only allows functions in PostgreSQL, I had
to make up a window function "rpr" which performs the row pattern
recognition task. I was not able to find a way to implement
expressions like START.price (START is not a table alias). Any
suggestion is greatly appreciated.
As in your example, you cannot have START.price outside of the window
specification; it can only go in the MEASURES clause. Only startprice
is allowed outside and it gets its qualification from the OVER. Using
w.startprice might have been better but that would require window names
to be in the same namespace as range tables.
This currently works in Postgres:
SELECT RANK() OVER w
FROM (VALUES (1)) AS w (x)
WINDOW w AS (ORDER BY w.x);
The differences from the standard include:
o MEASURES is not supported
o FIRST, LAST, CLASSIFIER are not supported
o PREV/NEXT in the standard accept more complex arguments
o INITIAL or SEEK are not supported ((behaves as if INITIAL is specified)
Okay, for now.
o SUBSET is not supported
Is this because you haven't done it yet, or because you ran into
problems trying to do it?
o Regular expressions other than "+" are not supported
This is what I had a hard time imagining how to do while thinking about
it. The grammar is so different here and we allow many more operators
(like "?" which is also the standard parameter symbol). People more
expert than me will have to help here.
o Only AFTER MATCH SKIP TO NEXT ROW is supported (if AFTER MATCH is
not specified, AFTER MATCH SKIP TO NEXT ROW is assumed. In the
standard AFTER MATCH SKIP PAST LAST ROW is assumed in this case). I
have a plan to implement AFTER MATCH SKIP PAST LAST ROW though.
In this case, we should require the user to specify AFTER MATCH SKIP TO
NEXT ROW so that behavior doesn't change when we implement the standard
default. (Your patch might do this already.)
o Aggregate functions associated with window clause using RPR do not respect RPR
I do not understand what this means.
It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.
I have no problem with that as long as we don't paint ourselves into a
corner.
Comments and suggestions are welcome.
I have not looked at the patch yet, but is the reason for doing R020
before R010 because you haven't done the MEASURES clause yet?
In any case, I will be watching this with a close eye, and I am eager to
help in any way I can.
--
Vik Fearing
I have been dreaming of and lobbying for someone to pick up this
feature. I will be sure to review it from a standards perspective and
will try my best to help with the technical aspect, but I am not sure
to have the qualifications for that.THANK YOU!
Thank you for looking into my proposal.
(I know SQL:2023 is already out, but I don't have access to it)
If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
of just its technical specification.
Thanks for the info.
I don't understand this. RPR in a window specification limits the
window to the matched rows, so this looks like your rpr() function is
just the regular first_value() window function that we already have?
No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.
company | tdate | price | rpr | first_value
----------+------------+-------+------+-------------
company1 | 2023-07-01 | 100 | | 100
company1 | 2023-07-02 | 200 | 200 | 200
company1 | 2023-07-03 | 150 | 150 | 150
company1 | 2023-07-04 | 140 | | 140
company1 | 2023-07-05 | 150 | 150 | 150
company1 | 2023-07-06 | 90 | | 90
company1 | 2023-07-07 | 110 | | 110
company1 | 2023-07-08 | 130 | | 130
company1 | 2023-07-09 | 120 | | 120
company1 | 2023-07-10 | 130 | | 130
For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.
As in your example, you cannot have START.price outside of the window
specification; it can only go in the MEASURES clause. Only startprice
is allowed outside and it gets its qualification from the OVER. Using
w.startprice might have been better but that would require window
names to be in the same namespace as range tables.This currently works in Postgres:
SELECT RANK() OVER w
FROM (VALUES (1)) AS w (x)
WINDOW w AS (ORDER BY w.x);
Interesting.
o SUBSET is not supported
Is this because you haven't done it yet, or because you ran into
problems trying to do it?
Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?
o Regular expressions other than "+" are not supported
This is what I had a hard time imagining how to do while thinking
about it. The grammar is so different here and we allow many more
operators (like "?" which is also the standard parameter symbol).
People more expert than me will have to help here.
Yes, that is a problem.
In this case, we should require the user to specify AFTER MATCH SKIP
TO NEXT ROW so that behavior doesn't change when we implement the
standard default. (Your patch might do this already.)
Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.
o Aggregate functions associated with window clause using RPR do not
respect RPRI do not understand what this means.
Ok, let me explain. See example below. In my understanding "count"
should retun the number of rows in a frame restriced by the match
condition. For example at the first line (2023-07-01 | 100) count
returns 10. I think this should be 0 because the "restriced" frame
starting at the line contains no matched row. On the other hand the
(restricted) frame starting at second line (2023-07-02 | 200) contains
4 rows, thus count should return 4, instead of 9.
SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
company | tdate | price | rpr | count
----------+------------+-------+------+-------
company1 | 2023-07-01 | 100 | | 10
company1 | 2023-07-02 | 200 | 200 | 9
company1 | 2023-07-03 | 150 | 150 | 8
company1 | 2023-07-04 | 140 | | 7
company1 | 2023-07-05 | 150 | 150 | 6
company1 | 2023-07-06 | 90 | | 5
company1 | 2023-07-07 | 110 | | 4
company1 | 2023-07-08 | 130 | | 3
company1 | 2023-07-09 | 120 | | 2
company1 | 2023-07-10 | 130 | | 1
It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.I have no problem with that as long as we don't paint ourselves into a
corner.
Totally agreed.
Comments and suggestions are welcome.
I have not looked at the patch yet, but is the reason for doing R020
before R010 because you haven't done the MEASURES clause yet?
One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.
In any case, I will be watching this with a close eye, and I am eager
to help in any way I can.
Thank you! I am looking forward to comments on my patch. Also any
idea how to implement MEASURES clause is welcome.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
In this case, we should require the user to specify AFTER MATCH SKIP
TO NEXT ROW so that behavior doesn't change when we implement the
standard default. (Your patch might do this already.)Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.
Attached is the v2 patch to add support for AFTER MATCH SKIP PAST LAST
ROW and AFTER MATCH SKIP PAST LAST ROW. The default is AFTER MATCH
SKIP PAST LAST ROW as the standard default. Here are some examples to
demonstrate how those clauses affect the query result.
SELECT i, rpr(i) OVER w
FROM (VALUES (1), (2), (3), (4)) AS v (i)
WINDOW w AS (
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B)
DEFINE
A AS i <= 2,
B AS i <= 3
);
i | rpr
---+-----
1 | 1
2 |
3 |
4 |
(4 rows)
In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B. Then rpr moves to row i = 3
and find that it does not satisfy A, thus the result is NULL. Same
thing can be said to row i = 4.
SELECT i, rpr(i) OVER w
FROM (VALUES (1), (2), (3), (4)) AS v (i)
WINDOW w AS (
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A B)
DEFINE
A AS i <= 2,
B AS i <= 3
);
i | rpr
---+-----
1 | 1
2 | 2
3 |
4 |
(4 rows)
In this example rpr starts from i = 1 and find that row i = 1
satisfies A, and row i = 2 satisfies B (same as above). Then rpr moves
to row i = 2, rather than 3 because AFTER MATCH SKIP TO NEXT ROW is
specified.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachments:
v2-0001-Row-pattern-recognition-patch-for-raw-parser.patchtext/x-patch; charset=us-asciiDownload+266-16
v2-0002-Row-pattern-recognition-patch-parse-analysis.patchtext/x-patch; charset=us-asciiDownload+205-2
v2-0003-Row-pattern-recognition-patch-planner.patchtext/x-patch; charset=us-asciiDownload+26-6
v2-0004-Row-pattern-recognition-patch-executor.patchtext/x-patch; charset=us-asciiDownload+548-11
v2-0005-Row-pattern-recognition-patch-docs.patchtext/x-patch; charset=us-asciiDownload+148-3
v2-0006-Row-pattern-recognition-patch-tests.patchtext/x-patch; charset=us-asciiDownload+450-2
On 6/26/23 03:05, Tatsuo Ishii wrote:
I don't understand this. RPR in a window specification limits the
window to the matched rows, so this looks like your rpr() function is
just the regular first_value() window function that we already have?No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.company | tdate | price | rpr | first_value
----------+------------+-------+------+-------------
company1 | 2023-07-01 | 100 | | 100
company1 | 2023-07-02 | 200 | 200 | 200
company1 | 2023-07-03 | 150 | 150 | 150
company1 | 2023-07-04 | 140 | | 140
company1 | 2023-07-05 | 150 | 150 | 150
company1 | 2023-07-06 | 90 | | 90
company1 | 2023-07-07 | 110 | | 110
company1 | 2023-07-08 | 130 | | 130
company1 | 2023-07-09 | 120 | | 120
company1 | 2023-07-10 | 130 | | 130For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.
Okay, I see the problem now, and why you need the rpr() function.
You are doing this as something that happens over a window frame, but it
is actually something that *reduces* the window frame. The pattern
matching needs to be done when the frame is calculated and not when any
particular function is applied over it.
This query (with all the defaults made explicit):
SELECT s.company, s.tdate, s.price,
FIRST_VALUE(s.tdate) OVER w,
LAST_VALUE(s.tdate) OVER w,
lowest OVER w
FROM stock AS s
WINDOW w AS (
PARTITION BY s.company
ORDER BY s.tdate
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
EXCLUDE NO OTHERS
MEASURES
LAST(DOWN) AS lowest
AFTER MATCH SKIP PAST LAST ROW
INITIAL PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
Should produce this result:
company | tdate | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
company1 | 07-01-2023 | 100 | | |
company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140
company1 | 07-03-2023 | 150 | | |
company1 | 07-04-2023 | 140 | | |
company1 | 07-05-2023 | 150 | | |
company1 | 07-06-2023 | 90 | | |
company1 | 07-07-2023 | 110 | | |
company1 | 07-08-2023 | 130 | 07-05-2023 | 07-05-2023 | 120
company1 | 07-09-2023 | 120 | | |
company1 | 07-10-2023 | 130 | | |
(10 rows)
Or if we switch to AFTER MATCH SKIP TO NEXT ROW, then we get:
company | tdate | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
company1 | 07-01-2023 | 100 | | |
company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140
company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 | 140
company1 | 07-04-2023 | 140 | | |
company1 | 07-05-2023 | 150 | 07-05-2023 | 07-08-2023 | 90
company1 | 07-06-2023 | 90 | | |
company1 | 07-07-2023 | 110 | | |
company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 | 120
company1 | 07-09-2023 | 120 | | |
company1 | 07-10-2023 | 130 | | |
(10 rows)
And then if we change INITIAL to SEEK:
company | tdate | price | first_value | last_value | lowest
----------+------------+-------+-------------+------------+--------
company1 | 07-01-2023 | 100 | 07-02-2023 | 07-05-2023 | 140
company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 | 140
company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 | 140
company1 | 07-04-2023 | 140 | 07-05-2023 | 07-08-2023 | 90
company1 | 07-05-2023 | 150 | 07-05-2023 | 07-08-2023 | 90
company1 | 07-06-2023 | 90 | 07-08-2023 | 07-10-2023 | 120
company1 | 07-07-2023 | 110 | 07-08-2023 | 07-10-2023 | 120
company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 | 120
company1 | 07-09-2023 | 120 | | |
company1 | 07-10-2023 | 130 | | |
(10 rows)
Since the pattern recognition is part of the frame, the window
aggregates should Just Work.
o SUBSET is not supported
Is this because you haven't done it yet, or because you ran into
problems trying to do it?Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?
SUBSET elements can be used in DEFINE clauses, but I do not think this
is important compared to other features.
Comments and suggestions are welcome.
I have not looked at the patch yet, but is the reason for doing R020
before R010 because you haven't done the MEASURES clause yet?One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.
I think we could do this with a single tuplesort if we use backtracking
(which might be really slow for some patterns). I have not looked into
it in any detail.
We would need to be able to remove tuples from the end (even if only
logically), and be able to update tuples inside the store. Both of
those needs come from backtracking and possibly changing the classifier.
Without backtracking, I don't see how we could do it without have a
separate tuplestore for every current possible match.
In any case, I will be watching this with a close eye, and I am eager
to help in any way I can.Thank you! I am looking forward to comments on my patch. Also any
idea how to implement MEASURES clause is welcome.
I looked at your v2 patches a little bit and the only comment that I
currently have on the code is you spelled PERMUTE as PREMUTE.
Everything else is hopefully explained above.
--
Vik Fearing
Okay, I see the problem now, and why you need the rpr() function.
You are doing this as something that happens over a window frame, but
it is actually something that *reduces* the window frame. The pattern
matching needs to be done when the frame is calculated and not when
any particular function is applied over it.
Yes. (I think the standard calls the window frame as "full window
frame" in context of RPR to make a contrast with the subset of the
frame rows restricted by RPR. The paper I refered to as [2] claims
that the latter window frame is called "reduced window frame" in the
standard but I wasn't able to find the term in the standard.)
I wanted to demonstate that pattern matching logic is basically
correct in the PoC patch. Now what I need to do is, move the row
pattern matching logic to somewhere inside nodeWindowAgg so that
"restricted window frame" can be applied to all window functions and
window aggregates. Currently I am looking into update_frameheadpos()
and update_frametailpos() which calculate the frame head and tail
against current row. What do you think?
This query (with all the defaults made explicit):
SELECT s.company, s.tdate, s.price,
FIRST_VALUE(s.tdate) OVER w,
LAST_VALUE(s.tdate) OVER w,
lowest OVER w
FROM stock AS s
WINDOW w AS (
PARTITION BY s.company
ORDER BY s.tdate
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
EXCLUDE NO OTHERS
MEASURES
LAST(DOWN) AS lowest
AFTER MATCH SKIP PAST LAST ROW
INITIAL PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);Should produce this result:
[snip]
Thanks for the examples. I agree with the expected query results.
o SUBSET is not supported
Is this because you haven't done it yet, or because you ran into
problems trying to do it?Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?SUBSET elements can be used in DEFINE clauses, but I do not think this
is important compared to other features.
Ok.
I have not looked at the patch yet, but is the reason for doing R020
before R010 because you haven't done the MEASURES clause yet?One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.I think we could do this with a single tuplesort if we use
backtracking (which might be really slow for some patterns). I have
not looked into it in any detail.We would need to be able to remove tuples from the end (even if only
logically), and be able to update tuples inside the store. Both of
those needs come from backtracking and possibly changing the
classifier.Without backtracking, I don't see how we could do it without have a
separate tuplestore for every current possible match.
Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?
I looked at your v2 patches a little bit and the only comment that I
currently have on the code is you spelled PERMUTE as
PREMUTE. Everything else is hopefully explained above.
Thanks. Will fix.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Small question.
This query (with all the defaults made explicit):
SELECT s.company, s.tdate, s.price,
FIRST_VALUE(s.tdate) OVER w,
LAST_VALUE(s.tdate) OVER w,
lowest OVER w
FROM stock AS s
WINDOW w AS (
PARTITION BY s.company
ORDER BY s.tdate
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
EXCLUDE NO OTHERS
MEASURES
LAST(DOWN) AS lowest
AFTER MATCH SKIP PAST LAST ROW
INITIAL PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
LAST(DOWN) AS lowest
should be "LAST(DOWN.price) AS lowest"?
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On 6/28/23 14:17, Tatsuo Ishii wrote:
Small question.
This query (with all the defaults made explicit):
SELECT s.company, s.tdate, s.price,
FIRST_VALUE(s.tdate) OVER w,
LAST_VALUE(s.tdate) OVER w,
lowest OVER w
FROM stock AS s
WINDOW w AS (
PARTITION BY s.company
ORDER BY s.tdate
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
EXCLUDE NO OTHERS
MEASURES
LAST(DOWN) AS lowest
AFTER MATCH SKIP PAST LAST ROW
INITIAL PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);LAST(DOWN) AS lowest
should be "LAST(DOWN.price) AS lowest"?
Yes, it should be. And the tdate='07-08-2023' row in the first
resultset should have '07-08-2023' and '07-10-2023' as its 4th and 5th
columns.
Since my brain is doing the processing instead of postgres, I made some
human errors. :-)
--
Vik Fearing
Hello,
Thanks for working on this! We're interested in RPR as well, and I've
been trying to get up to speed with the specs, to maybe make myself
useful.
On 6/27/23 17:58, Tatsuo Ishii wrote:
Yes. (I think the standard calls the window frame as "full window
frame" in context of RPR to make a contrast with the subset of the
frame rows restricted by RPR. The paper I refered to as [2] claims
that the latter window frame is called "reduced window frame" in the
standard but I wasn't able to find the term in the standard.)
19075-5 discusses that, at least; not sure about other parts of the spec.
Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?
Are we guaranteed to always have an equivalent window clause? There seem
to be many differences between the two, especially when it comes to ONE
ROW/ALL ROWS PER MATCH.
--
To add onto what Vik said above:
It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.I have no problem with that as long as we don't paint ourselves into a
corner.
To me, PATTERN looks like an area where we may want to support a broader
set of operations in the first version. The spec has a bunch of
discussion around cases like empty matches, match order of alternation
and permutation, etc., which are not possible to express or test with
only the + quantifier. Those might be harder to get right in a v2, if we
don't at least keep them in mind for v1?
+static List * +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef) +{ + List *patterns;
My compiler complains about the `patterns` variable here, which is
returned without ever being initialized. (The caller doesn't seem to use
it.)
+-- basic test using PREV +SELECT company, tdate, price, rpr(price) OVER w FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +);
nitpick: IMO the tests should be making use of ORDER BY in the window
clauses.
This is a very big feature. I agree with you that MEASURES seems like a
very important "next piece" to add. Are there other areas where you
would like reviewers to focus on right now (or avoid)?
Thanks!
--Jacob
Hello,
Thanks for working on this! We're interested in RPR as well, and I've
been trying to get up to speed with the specs, to maybe make myself
useful.
Thank you for being interested in this.
19075-5 discusses that, at least; not sure about other parts of the spec.
Thanks for the info. Unfortunately I don't have 19075-5 though.
Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
into Window clause with RPR?Are we guaranteed to always have an equivalent window clause? There seem
to be many differences between the two, especially when it comes to ONE
ROW/ALL ROWS PER MATCH.
You are right. I am not 100% sure if the rewriting is possible at this
point.
To add onto what Vik said above:
It seems RPR in the standard is quite complex. I think we can start
with a small subset of RPR then we could gradually enhance the
implementation.I have no problem with that as long as we don't paint ourselves into a
corner.To me, PATTERN looks like an area where we may want to support a broader
set of operations in the first version.
Me too but...
The spec has a bunch of
discussion around cases like empty matches, match order of alternation
and permutation, etc., which are not possible to express or test with
only the + quantifier. Those might be harder to get right in a v2, if we
don't at least keep them in mind for v1?
Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?
+static List * +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef) +{ + List *patterns;My compiler complains about the `patterns` variable here, which is
returned without ever being initialized. (The caller doesn't seem to use
it.)
Will fix.
+-- basic test using PREV +SELECT company, tdate, price, rpr(price) OVER w FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +);nitpick: IMO the tests should be making use of ORDER BY in the window
clauses.
Right. Will fix.
This is a very big feature. I agree with you that MEASURES seems like a
very important "next piece" to add. Are there other areas where you
would like reviewers to focus on right now (or avoid)?
Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Hi Ishii-san,
On 7/19/23 22:15, Tatsuo Ishii wrote:
Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?
Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free... Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?
Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?
I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, like
DEFINE B AS B.price > PREV(FIRST(A.price), 3)
so there won't be just one pair of "previous and next" tuples. Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.
--
Taking a closer look at the regex engine:
It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern like
PATTERN ( a+ )
DEFINE a AS TRUE
which doesn't seem to match anything in my testing.
There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The pattern
PATTERN ( a+ b+ )
DEFINE a AS col = 2,
b AS col = 2
doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.
(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)
I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?
Thanks,
--Jacob
Attachments:
tests.diff.txttext/plain; charset=UTF-8; name=tests.diff.txtDownload+100-0
On 7/21/23 01:36, Jacob Champion wrote:
There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread.
We definitely need some kind of backtracking or other reclassification
method.
(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I_think_ should be the correct results.)
Almost. You are matching 07-01-2023 but the condition is "price > 100".
--
Vik Fearing
Hi,
Hi Ishii-san,
On 7/19/23 22:15, Tatsuo Ishii wrote:
Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free...
I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1). Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.
Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?
One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,
PATTERN UP+
Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.
Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, likeDEFINE B AS B.price > PREV(FIRST(A.price), 3)
so there won't be just one pair of "previous and next" tuples.
Yes, I know.
Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.
Unfortunately an window function can not call other window functions.
Taking a closer look at the regex engine:
It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern likePATTERN ( a+ )
DEFINE a AS TRUEwhich doesn't seem to match anything in my testing.
There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The patternPATTERN ( a+ b+ )
DEFINE a AS col = 2,
b AS col = 2doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)
Thanks. I will look into this.
I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?
For "+" yes. But for more complex regular expression like '{n}', we
need to call our regexp engine to check if the pattern matches.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On 7/20/23 17:07, Vik Fearing wrote:
On 7/21/23 01:36, Jacob Champion wrote:
(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I_think_ should be the correct results.)Almost. You are matching 07-01-2023 but the condition is "price > 100".
D'oh. Correction attached. I think :)
Thanks,
--Jacob
Attachments:
tests.diff.txttext/plain; charset=UTF-8; name=tests.diff.txtDownload+100-0
On 7/20/23 23:16, Tatsuo Ishii wrote:
I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1). Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.
Ah. It's been too long since my compilers classes; I will pipe down.
One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,PATTERN UP+
Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.
Is the eventual plan to generate multiple alternatives, and run the
regex against them one at a time?
I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, likeDEFINE B AS B.price > PREV(FIRST(A.price), 3)
so there won't be just one pair of "previous and next" tuples.
Yes, I know.
I apologize. I got overexplain-y.
Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.Unfortunately an window function can not call other window functions.
Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case? Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?
Thanks!
--Jacob
On 7/22/23 01:14, Jacob Champion wrote:
On 7/20/23 17:07, Vik Fearing wrote:
On 7/21/23 01:36, Jacob Champion wrote:
(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I_think_ should be the correct results.)Almost. You are matching 07-01-2023 but the condition is "price > 100".
D'oh. Correction attached. I think :)
This looks correct to my human brain. Thanks!
--
Vik Fearing
One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,PATTERN UP+
Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.Is the eventual plan to generate multiple alternatives, and run the
regex against them one at a time?
Yes, that's my plan.
I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, likeDEFINE B AS B.price > PREV(FIRST(A.price), 3)
so there won't be just one pair of "previous and next" tuples.
Yes, I know.
I apologize. I got overexplain-y.
No problem. Thank you for reminding me it.
Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.Unfortunately an window function can not call other window functions.
Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.
Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?
Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On 7/22/23 03:11, Tatsuo Ishii wrote:
Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.Unfortunately an window function can not call other window functions.
Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?
Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.
I don't understand this logic. Window functions work over a window
frame. What we are talking about here is *defining* a window frame.
How can a window function execute row pattern matching?
--
Vik Fearing
On 7/22/23 03:11, Tatsuo Ishii wrote:
Maybe
that can help clarify the design? It feels like it'll need to
eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.Unfortunately an window function can not call other window functions.
Can that restriction be lifted for the EXPR_KIND_RPR_DEFINE case?
I am not sure at this point. Current PostgreSQL executor creates
WindowStatePerFuncData for each window function and aggregate
appearing in OVER clause. This means PREV/NEXT and other row pattern
navigation operators cannot have their own WindowStatePerFuncData if
they do not appear in OVER clauses in a query even if PREV/NEXT
etc. are defined as window function.Or
does it make sense to split the pattern navigation "functions" into
their own new concept, and maybe borrow some of the window function
infrastructure for it?Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.I don't understand this logic. Window functions work over a window
frame.
Yes.
What we are talking about here is *defining* a window
frame.
Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.
How can a window function execute row pattern matching?
A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.
Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.
/*
* last_value
* return the value of VE evaluated on the last row of the
* window frame, per spec.
*/
Datum
window_last_value(PG_FUNCTION_ARGS)
{
WindowObject winobj = PG_WINDOW_OBJECT();
Datum result;
bool isnull;
int64 abspos;
int num_reduced_frame;
abspos = WinGetCurrentPosition(winobj);
num_reduced_frame = row_is_in_reduced_frame(winobj, abspos);
if (num_reduced_frame == 0)
/* no RPR is involved */
result = WinGetFuncArgInFrame(winobj, 0,
0, WINDOW_SEEK_TAIL, true,
&isnull, NULL);
else if (num_reduced_frame > 0)
/* get last row value in the reduced frame */
result = WinGetFuncArgInFrame(winobj, 0,
num_reduced_frame - 1, WINDOW_SEEK_HEAD, true,
&isnull, NULL);
else
/* RPR is involved and current row is unmatched or skipped */
isnull = true;
if (isnull)
PG_RETURN_NULL();
PG_RETURN_DATUM(result);
}
On 7/22/23 08:14, Tatsuo Ishii wrote:
On 7/22/23 03:11, Tatsuo Ishii wrote:
Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.I don't understand this logic. Window functions work over a window
frame.Yes.
What we are talking about here is *defining* a window
frame.Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.
Why? It should only be recalculated when the current row changes and we
need a new frame. The reduced window frame *is* the window frame for
all functions over that window.
How can a window function execute row pattern matching?
A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.
I strongly disagree with this. Window function do not need to know how
the frame is defined, and indeed they should not. WinGetFuncArgInFrame
should answer yes or no and the window function just works on that.
Otherwise we will get extension (and possibly even core) functions that
don't handle the frame properly.
--
Vik Fearing