[PATCH] Allow multiple recursive self-references

Started by Denis Hirnalmost 5 years ago31 messages

denis.hirn@uni-tuebingen.de

almost 5 years ago

1 attachment(s)

Hey everyone,

As you know, Postgres currently supports SQL:1999 recursive common table
expressions, using WITH RECURSIVE. However, Postgres does not allow more than
one recursive self-reference in the recursive term. This restriction seems to be
unnecessary.

In this mail, I'd like to propose a patch that removes this restriction, and
therefore allows the use of multiple self-references in the recursive term.
After the patch:

WITH RECURSIVE t(n) AS (
VALUES(1)
UNION ALL
SELECT t.n+f.n
FROM t, t AS f
WHERE t.n < 100
) SELECT * FROM t;

n
-----
1
2
4
8
16
32
64
128
(8 rows)

This feature deviates only slightly from the current WITH RECURSIVE, and
requires very little changes (~10 loc). Any thoughts on this?

--
Denis Hirn

Attachments:

0001-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0001-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From c3d1a8ab4d1864d15b3605e3d42b8f1ab7e0ad15 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple recursive self-references

---
 postgres/src/backend/executor/nodeWorktablescan.c | 13 +++++++------
 postgres/src/backend/parser/parse_cte.c           | 10 ++--------
 postgres/src/include/nodes/execnodes.h            |  1 +
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/postgres/src/backend/executor/nodeWorktablescan.c b/postgres/src/backend/executor/nodeWorktablescan.c
index e8f5caf..b2e1423 100644
--- a/postgres/src/backend/executor/nodeWorktablescan.c
+++ b/postgres/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,8 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
-
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +145,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll see this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -218,6 +217,8 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 	ExecScanReScan(&node->ss);
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
-	if (node->rustate)
+	if (node->rustate) {
+		if(node->readptr != -1) tuplestore_select_read_pointer(node->rustate->working_table, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/postgres/src/backend/parser/parse_cte.c b/postgres/src/backend/parser/parse_cte.c
index 1fca748..5ab8310 100644
--- a/postgres/src/backend/parser/parse_cte.c
+++ b/postgres/src/backend/parser/parse_cte.c
@@ -674,7 +674,7 @@ checkWellFormedRecursion(CteState *cstate)
 		cstate->context = RECURSION_OK;
 		checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
 		Assert(cstate->innerwiths == NIL);
-		if (cstate->selfrefcount != 1)	/* shouldn't happen */
+		if (cstate->selfrefcount < 1)	/* shouldn't happen */
 			elog(ERROR, "missing recursive reference");
 
 		/* WITH mustn't contain self-reference, either */
@@ -771,13 +771,7 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate)
 							 parser_errposition(cstate->pstate,
 												rv->location)));
 				/* Count references */
-				if (++(cstate->selfrefcount) > 1)
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_RECURSION),
-							 errmsg("recursive reference to query \"%s\" must not appear more than once",
-									mycte->ctename),
-							 parser_errposition(cstate->pstate,
-												rv->location)));
+				++(cstate->selfrefcount);
 			}
 		}
 		return false;
diff --git a/postgres/src/include/nodes/execnodes.h b/postgres/src/include/nodes/execnodes.h
index 015bfc0..56e4ce3 100644
--- a/postgres/src/include/nodes/execnodes.h
+++ b/postgres/src/include/nodes/execnodes.h
@@ -1782,6 +1782,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* indexof my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
-- 
2.30.1

Pantelis Theodosiou

ypercube@gmail.com

almost 5 years ago

In reply to: Denis Hirn (#1)

Re: [PATCH] Allow multiple recursive self-references

On Tue, Mar 23, 2021 at 1:03 PM Denis Hirn <denis.hirn@uni-tuebingen.de>
wrote:

Hey everyone,

As you know, Postgres currently supports SQL:1999 recursive common table
expressions, using WITH RECURSIVE. However, Postgres does not allow more
than
one recursive self-reference in the recursive term. This restriction seems
to be
unnecessary.

In this mail, I'd like to propose a patch that removes this restriction,
and
therefore allows the use of multiple self-references in the recursive term.
After the patch:

WITH RECURSIVE t(n) AS (
VALUES(1)
UNION ALL
SELECT t.n+f.n
FROM t, t AS f
WHERE t.n < 100
) SELECT * FROM t;

n
-----
1
2
4
8
16
32
64
128
(8 rows)

This feature deviates only slightly from the current WITH RECURSIVE, and
requires very little changes (~10 loc). Any thoughts on this?

--
Denis Hirn

I am not at all sure what the standard says about such recursion but it
looks like the two t's are treated in your patch as the same incarnation of
the table, not as a cross join of two incarnations. The natural result I
would expect from a this query would be all numbers from 1 to 198 (assuming
that the query is modified to restrict f.n and that UNION ALL is converted
to UNION to avoid infinite recursion).

I don't think any other DBMS has implemented this, except MariaDB. Tested
here:
https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=565c22771fdfc746e05808a7da7a205f

SET @@standard_compliant_cte=0;
WITH RECURSIVE t(n) AS (
SELECT 1
UNION -- ALL
SELECT t.n + f.n
FROM t, t AS f
WHERE t.n < 4 AND f.n < 4
) SELECT * FROM t;

Result:

| n |
| -: |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |

Best regards
Pantelis Theodosiou

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 5 years ago

In reply to: Pantelis Theodosiou (#2)

Re: [PATCH] Allow multiple recursive self-references

Hey Pantelis,

I am not at all sure what the standard says about such recursion [...]

as far as I know, the standard does not constraint the number of self-references
of recursive common table expressions. However, I could be wrong here.

[...] but it looks like the two t's are treated in your patch as the same incarnation of the table, not as a cross join of two incarnations.

That's right and – as far as I'm concerned – it's expected behaviour. The patch only allows the recursive
union operator's working table to be read more than once. All self-references read exactly the same rows
in each iteration. You could basically accomplish the same thing with another CTE like this:

WITH RECURSIVE t(n) AS (
VALUES(1)
UNION ALL
(WITH wt AS (SELECT * FROM t)
SELECT wt.n+f.n
FROM wt, wt AS f
WHERE wt.n < 100)
) SELECT * FROM t;

But honestly, this feels more like a hack than a solution to me. The entire working table is
materialized by the (non recursive) common table expression wt, effectively doubling the
memory consumption of the query. This patch eliminates this intermediate materialization.

I don't think any other DBMS has implemented this, except MariaDB. Tested here:

There are a few recent DBMSs that I know of that support this: HyPer, Umbra, DuckDB, and NoisePage.
I'm sure there are some more examples. Still, you are right, many other DBMSs do not support this – yet.

--
Denis Hirn

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Denis Hirn (#3)

Re: [PATCH] Allow multiple recursive self-references

Denis Hirn <denis.hirn@uni-tuebingen.de> writes:

I am not at all sure what the standard says about such recursion [...]

as far as I know, the standard does not constraint the number of self-references
of recursive common table expressions. However, I could be wrong here.

As far as I can see, the spec flat-out forbids this. In SQL:2021,
it's discussed in 7.17 <query expression> syntax rule 3) j) ix), which
defines

ix) If WLEi is recursive, then:
1) Let S be the stratum that contains WQNi.
2) If WQEi does not contain a <query specification> that contains
more than one <query name> referencing members of S, then WLEi is
linearly recursive.

("stratum" means a set of mutually-recursive WITH items), and they
helpfully explain

NOTE 308 — For example, if WLEi contains the <query specification>
SELECT * FROM A AS A1, A AS A2
where A is a <query name> in S, then WLEi is not linearly recursive. The
point is that this <query specification> contains two references to
members of S. It is irrelevant that they reference the same member of S.

and then the next rule says

x) If WLEi is recursive, then WLEi shall be linearly recursive.

So the problem with extending the spec here is that someday they might
extend it with some other semantics, and then we're between a rock and
a hard place.

If there were really compelling reasons why (a) we have to have this
feature and (b) there's only one reasonable way for it to act, hence
no likelihood that the SQL committee will choose something else, then
I'd be willing to think about getting out front of the committee.
But you haven't made either case.

I don't think any other DBMS has implemented this, except MariaDB. Tested here:

There are a few recent DBMSs that I know of that support this: HyPer, Umbra, DuckDB, and NoisePage.

Do they all act the same? Has anybody that sits on the SQL committee
done it? (If you could point to DB2, in particular, I'd have some
confidence that the committee might standardize on their approach.)

A totally independent question is whether you've even defined a
well-defined behavior. There's an interesting comment in the spec:

NOTE 310 — The restrictions insure that each WLEi, viewed as a
transformation of the query names of the stratum, is monotonically
increasing. According to Tarski’s fixed point theorem, this insures that
there is a fixed point. The General Rules use Kleene’s fixed point
theorem to define a sequence that converges to the minimal fixed
point.

I don't know any of the underlying math here, but if you've got a
join of multiple recursion references then the number of output
rows could certainly be nonlinear, which very possibly breaks the
argument that there's a well-defined interpretation.

regards, tom lane

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 5 years ago

In reply to: Tom Lane (#4)

Re: [PATCH] Allow multiple recursive self-references

Thanks for the feedback, Tom.

Tom Lane <tgl@sss.pgh.pa.us> writes:
[...]
As far as I can see, the spec flat-out forbids this. In SQL:2021,
it's discussed in 7.17 <query expression> syntax rule 3) j) ix), which
defines [linear recursion]

(Aside: We don't have a copy of the SQL:2021 specification here (all
we've got here is the SQL:2016 variant). We've browsed the ISO site
and didn't find find SQL:2021 there. Is that a generally available
draft document?)

So the problem with extending the spec here is that someday they might
extend it with some other semantics, and then we're between a rock and
a hard place.

There are two issues, say [LIN] and [NON-LIN], here. Re [LIN]:

[LIN] PostgreSQL does not allow multiple references to the recursive
common table, even if the recursion is LINEAR. Plenty of examples
of such queries are found in the documentation of established RDBMSs,
among these IBM Db2 and MS SQL Server:

- https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/sqlp/rbafyrecursivequeries.htm#d60691e2455
(example "Two tables used for recursion using recursive common
table expressions")

- https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15#h-using-multiple-anchor-and-recursive-members
(example "Using multiple anchor and recursive members")

Db2 and SQL Server process such linear queries with multiple recursive
CTE references just fine. As does SQLite3. With the proposed patch,
PostgreSQL would be able to process these, too (and deliver the same
expected result).

If there were really compelling reasons why (a) we have to have this
feature and (b) there's only one reasonable way for it to act, hence
no likelihood that the SQL committee will choose something else, then
I'd be willing to think about getting out front of the committee.
But you haven't made either case.
[...]
Do they all act the same? Has anybody that sits on the SQL committee
done it? (If you could point to DB2, in particular, I'd have some
confidence that the committee might standardize on their approach.)

We'd classify the ability of established RDBMSs to cope with the just
mentioned class of queries (and PostgreSQL's inability to do the same)
as one compelling reason. Also, the existing functionality in Db2 and
SQL Server would be a yardstick for the expected behavior.

A totally independent question is whether you've even defined a
well-defined behavior. There's an interesting comment in the spec:

NOTE 310 — The restrictions insure that each WLEi, viewed as a
transformation of the query names of the stratum, is monotonically
increasing. According to Tarski’s fixed point theorem, this insures that
there is a fixed point. The General Rules use Kleene’s fixed point
theorem to define a sequence that converges to the minimal fixed
point.

I don't know any of the underlying math here, but if you've got a
join of multiple recursion references then the number of output
rows could certainly be nonlinear, which very possibly breaks the
argument that there's a well-defined interpretation.

This brings us to [NON-LIN]:

[NON-LIN] NOTE 310 refers to Tarski's fixed point theorem and the
prerequisite that the recursive CTE defines a MONOTONIC query (this
guarantees the existence of a least fixed point which defines
the meaning of a recursive CTE in the first place). MONOTONICITY
and NON-LINEAR recursion, however, are two separate/orthogonal issues.

A linear recursive query can be monotonic or non-monotonic (and PostgreSQL
currently has a system of safeguards that aim to identify the latter
kind of problematic queries, ruling out the use of EXCEPT, aggregation,
and so forth), just like a non-linear query can. A mononotic non-linear
recursive query approaches a least fixed point which makes the
behavior of a non-linear recursive CTE as well-defined as a linear
CTE.

In fact, the document that led to the inclusion of WITH RECURSIVE into
the SQL standard (see reference [1]S.J. Finkelstein, N. Mattos, I.S. Mumick, H. Pirahesh, "Expressing Recursive Queries in SQL,"" ANSI Document X3H2-96-075r1, 1996, https://www.kirusa.com/mumick/pspapers/ansiRevisedRecProp96-075r1.ps.Z below) mentions that "Linear
recursion is a special case of non-linear recursion" (Section 3.9) in
the fixpoint-based semantics. (It appears that the authors aimed to
introduce non-linear WITH RECURSIVE from the start but the existing
RECURSIVE UNION proposal at the time was restricted to linear recursion;
so they paddled back and suggested to include non-linear recursion in a
future SQL standard update, coined SQL4 back then).

We do agree, though, that the absence of non-linear recursion in the SQL
standard has openened the field for varied interpretations of the
semantics. MariaDB, for example, admits non-linear recursion once you
set the configuration option standard_compliant_cte to 0. However, a
recursive table reference then returns *all* rows collected so far (not
just those rows produced by the last iteration). This implicit change of
behavior makes sense for use cases of non-linear recursion, yet may come
as a surprise for query authors. Since this implicit change could
alternatively be made explicit (and thus, arguably, clearer) in terms of
a UNION with the recursive table reference, we'd argue to keep the
semantics of recursive table references as is. But before you know, you
end up with diverging semantics for a single SQL construct, just as you said
above.

Given this, we would propose the patch as a means to allow multiple
recursive references for linear queries (see [LIN] above). This allows
PostgreSQL to catch up with Db2, SQL Server, or SQLite3. The semantics
in the [LIN] case are clear.

Best wishes,
--Denis and Torsten

[1]: S.J. Finkelstein, N. Mattos, I.S. Mumick, H. Pirahesh, "Expressing Recursive Queries in SQL,"" ANSI Document X3H2-96-075r1, 1996, https://www.kirusa.com/mumick/pspapers/ansiRevisedRecProp96-075r1.ps.Z
"Expressing Recursive Queries in SQL,"" ANSI Document X3H2-96-075r1, 1996,
https://www.kirusa.com/mumick/pspapers/ansiRevisedRecProp96-075r1.ps.Z

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 5 years ago

In reply to: Denis Hirn (#5)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

Based on Toms feedback, and due to the fact that SQL:2021 forbids
non-linear recursion, version 2 of the patch allows only linear
recursion. Therefore, later SQL committee decisions on non-linear
recursion should not be problematic.

[LIN] PostgreSQL does not allow multiple references to the recursive
common table, even if the recursion is LINEAR. Plenty of examples
of such queries are found in the documentation of established RDBMSs,
among these IBM Db2 and MS SQL Server:

- https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/sqlp/rbafyrecursivequeries.htm#d60691e2455 <https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/sqlp/rbafyrecursivequeries.htm#d60691e2455>
(example "Two tables used for recursion using recursive common
table expressions")

See the gist at [1]https://gist.github.com/kryonix/73d77d3eaa5a15b3a4bdb7d590fa1253 <https://gist.github.com/kryonix/73d77d3eaa5a15b3a4bdb7d590fa1253> below for SQL code that features multiple (yet linear)
recursive references. A patched PostgreSQL runs this query as expected.

Best wishes,
--Denis and Torsten

[1]: https://gist.github.com/kryonix/73d77d3eaa5a15b3a4bdb7d590fa1253 <https://gist.github.com/kryonix/73d77d3eaa5a15b3a4bdb7d590fa1253>

Attachments:

0002-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0002-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 8d6348f8b09543ba2589bcfb74bc7924c35c46c7 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 .../src/backend/executor/nodeWorktablescan.c   | 13 +++++++------
 postgres/src/backend/parser/parse_cte.c        | 18 +++++++++++++++---
 postgres/src/include/nodes/execnodes.h         |  1 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/postgres/src/backend/executor/nodeWorktablescan.c b/postgres/src/backend/executor/nodeWorktablescan.c
index e8f5caf..b2e1423 100644
--- a/postgres/src/backend/executor/nodeWorktablescan.c
+++ b/postgres/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,8 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
-
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +145,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll see this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -218,6 +217,8 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 	ExecScanReScan(&node->ss);
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
-	if (node->rustate)
+	if (node->rustate) {
+		if(node->readptr != -1) tuplestore_select_read_pointer(node->rustate->working_table, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/postgres/src/backend/parser/parse_cte.c b/postgres/src/backend/parser/parse_cte.c
index 1fca748..fbfdf30 100644
--- a/postgres/src/backend/parser/parse_cte.c
+++ b/postgres/src/backend/parser/parse_cte.c
@@ -924,9 +924,21 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* Check selfrefcount for each recursive member individually */
+				if((Node *) stmt->larg != NULL && (Node *) stmt->rarg != NULL) {
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+				} else {
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/postgres/src/include/nodes/execnodes.h b/postgres/src/include/nodes/execnodes.h
index 015bfc0..e894c41 100644
--- a/postgres/src/include/nodes/execnodes.h
+++ b/postgres/src/include/nodes/execnodes.h
@@ -1782,6 +1782,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
-- 
2.30.1

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 5 years ago

In reply to: Denis Hirn (#6)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

Sorry, I didn't append the patch properly.

Best wishes,
--Denis

Attachments:

v2-0001-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=v2-0001-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 8d6348f8b09543ba2589bcfb74bc7924c35c46c7 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 .../src/backend/executor/nodeWorktablescan.c   | 13 +++++++------
 postgres/src/backend/parser/parse_cte.c        | 18 +++++++++++++++---
 postgres/src/include/nodes/execnodes.h         |  1 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/postgres/src/backend/executor/nodeWorktablescan.c b/postgres/src/backend/executor/nodeWorktablescan.c
index e8f5caf..b2e1423 100644
--- a/postgres/src/backend/executor/nodeWorktablescan.c
+++ b/postgres/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,8 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
-
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +145,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll see this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -218,6 +217,8 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 	ExecScanReScan(&node->ss);
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
-	if (node->rustate)
+	if (node->rustate) {
+		if(node->readptr != -1) tuplestore_select_read_pointer(node->rustate->working_table, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/postgres/src/backend/parser/parse_cte.c b/postgres/src/backend/parser/parse_cte.c
index 1fca748..fbfdf30 100644
--- a/postgres/src/backend/parser/parse_cte.c
+++ b/postgres/src/backend/parser/parse_cte.c
@@ -924,9 +924,21 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* Check selfrefcount for each recursive member individually */
+				if((Node *) stmt->larg != NULL && (Node *) stmt->rarg != NULL) {
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+				} else {
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/postgres/src/include/nodes/execnodes.h b/postgres/src/include/nodes/execnodes.h
index 015bfc0..e894c41 100644
--- a/postgres/src/include/nodes/execnodes.h
+++ b/postgres/src/include/nodes/execnodes.h
@@ -1782,6 +1782,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
-- 
2.30.1

vignesh C

vignesh21@gmail.com

over 4 years ago

In reply to: Denis Hirn (#7)

Re: [PATCH] Allow multiple recursive self-references

On Wed, Mar 31, 2021 at 7:28 PM Denis Hirn <denis.hirn@uni-tuebingen.de> wrote:

Sorry, I didn't append the patch properly.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: vignesh C (#8)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

The patch does not apply on Head anymore, could you rebase and post a patch.

Sure thing. Here's the new patch.

Best wishes,
-- Denis

Attachments:

v3-0001-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=v3-0001-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From ab2ebfb0eb3ea64b71a0420ade78da62455f7d28 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 src/backend/executor/nodeWorktablescan.c | 13 +++++++------
 src/backend/parser/parse_cte.c           | 18 +++++++++++++++---
 src/include/nodes/execnodes.h            |  1 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..0a780074d3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,8 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
-
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +145,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll see this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -218,6 +217,8 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 	ExecScanReScan(&node->ss);
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
-	if (node->rustate)
+	if (node->rustate) {
+		if(node->readptr != -1) tuplestore_select_read_pointer(node->rustate->working_table, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..af88eb822e 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -1117,9 +1117,21 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* Check selfrefcount for each recursive member individually */
+				if((Node *) stmt->larg != NULL && (Node *) stmt->rarg != NULL) {
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+				} else {
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 105180764e..6fb2dd2b1d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
-- 
2.32.0

#10

David Fetter

david@fetter.org

over 4 years ago

In reply to: Denis Hirn (#9)

Re: [PATCH] Allow multiple recursive self-references

On Thu, Jul 15, 2021 at 09:18:00AM +0200, Denis Hirn wrote:

The patch does not apply on Head anymore, could you rebase and post a patch.

Sure thing. Here's the new patch.

Best wishes,

Thanks for updating this.

In the next version of the patch, would you be so kind as to include
updated documentation of the feature and at least one regression test
of same?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#11

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: David Fetter (#10)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

In the next version of the patch, would you be so kind as to include
updated documentation of the feature and at least one regression test
of same?

As requested, this new version of the patch contains regression tests and documentation.

Best wishes,
-- Denis

Attachments:

0004-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0004-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 236422c7dfa53038e80af99b82f523ac40eebdf1 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 doc/src/sgml/queries.sgml                | 38 +++++++++++++++++
 src/backend/executor/nodeWorktablescan.c | 13 +++---
 src/backend/parser/parse_cte.c           | 23 ++++++++--
 src/include/nodes/execnodes.h            |  1 +
 src/test/regress/expected/with.out       | 54 +++++++++++++++++++++---
 src/test/regress/sql/with.sql            | 22 +++++++++-
 6 files changed, 134 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..4c522bf868 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+  <sect3 id="queries-recursive-terms">
+    <title>Multiple Recursive Terms</title>
+
+    <para>
+    The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+    query may contain exactly one self-reference. Multiple such self-references
+    are only allowed if the <literal>WITH</literal> query features multiple recursive
+    terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+    Consider the example query below which computes the transitive closure of trave connections
+    from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+    <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+    Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+    self-references to <literal>connections</literal> to find viable onward jorneys,
+    in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     (SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city')
+  UNION ALL -- two recursive terms below
+     (SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source)
+)
+SELECT *  FROM tc;
+</programlisting>
+    </para>
+  </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..0a780074d3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,8 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
-
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +145,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll see this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -218,6 +217,8 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 	ExecScanReScan(&node->ss);
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
-	if (node->rustate)
+	if (node->rustate) {
+		if(node->readptr != -1) tuplestore_select_read_pointer(node->rustate->working_table, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..e426809abc 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -1117,9 +1117,26 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* Check selfrefcount for each recursive member individually */
+				if((Node *) stmt->larg != NULL && (Node *) stmt->rarg != NULL) {
+					int selfrefcount = cstate->selfrefcount;
+					int selfrefcountL;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					selfrefcountL = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if(cstate->selfrefcount<selfrefcountL)
+						cstate->selfrefcount = selfrefcountL;
+				} else {
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 105180764e..6fb2dd2b1d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..8f83582d7b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,48 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1837,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..fab1933e3d 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,24 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +865,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +874,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#12

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Denis Hirn (#11)

Re: [PATCH] Allow multiple recursive self-references

On 20.07.21 13:15, Denis Hirn wrote:

In the next version of the patch, would you be so kind as to include
updated documentation of the feature and at least one regression test
of same?

As requested, this new version of the patch contains regression tests and documentation.

The tests fail when you build with assertions enabled (configure
--enable-cassert).

Some quick style comments:

DocBook content should use 1-space indentation (not 2 spaces).

Typo?: /* we'll see this later */ -> "set"

Opening brace after "if" should be on new line. (See existing code.)

Also, there should be a space after "if".

These casts appear to be unnecessary:

if((Node *) stmt->larg != NULL && (Node *) stmt->rarg != NULL)

#13

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Peter Eisentraut (#12)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

The tests fail when you build with assertions enabled (configure --enable-cassert).

Thank you for pointing that out. The new version of this patch fixes that.
The tests are working properly now. All style related issues are fixed as well.

Best wishes,
-- Denis

Attachments:

0005-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0005-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 93062d2f086b8c857b63bc0788457fa6727f7944 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 doc/src/sgml/queries.sgml                | 38 +++++++++++++++++
 src/backend/executor/nodeWorktablescan.c | 21 +++++++--
 src/backend/parser/parse_cte.c           | 26 ++++++++++--
 src/include/nodes/execnodes.h            |  1 +
 src/test/regress/expected/with.out       | 54 +++++++++++++++++++++---
 src/test/regress/sql/with.sql            | 22 +++++++++-
 6 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..2a01d43f9b 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+ <sect3 id="queries-recursive-terms">
+  <title>Multiple Recursive Terms</title>
+
+  <para>
+   The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+   query may contain exactly one self-reference. Multiple such self-references
+   are only allowed if the <literal>WITH</literal> query features multiple recursive
+   terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+   Consider the example query below which computes the transitive closure of trave connections
+   from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+   <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+   Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+   self-references to <literal>connections</literal> to find viable onward jorneys,
+   in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     (SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city')
+  UNION ALL -- two recursive terms below
+     (SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source)
+)
+SELECT *  FROM tc;
+</programlisting>
+  </para>
+ </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..c73d23d1de 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,14 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
+		/*
+		 * Allocate a unique read pointer for each worktable scan.
+		 */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
+		tuplestore_rescan(node->rustate->working_table);
 
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +151,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll set this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -219,5 +224,13 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
 	if (node->rustate)
+	{
+		/* Make sure to select the initial read pointer */
+		tuplestore_select_read_pointer(node->rustate->working_table, 0);
+		tuplestore_rescan(node->rustate->working_table);
+		/* Reallocate a unique read pointer. */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..38b1a4beb3 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -1117,9 +1117,29 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* check selfrefcount for each recursive member individually */
+				if (stmt->larg != NULL && stmt->rarg != NULL)
+				{
+					int selfrefcount = cstate->selfrefcount;
+					int selfrefcountL;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					selfrefcountL = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if (cstate->selfrefcount < selfrefcountL)
+						cstate->selfrefcount = selfrefcountL;
+				}
+				else
+				{
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 37cb4f3d59..b53844437c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..8f83582d7b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,48 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1837,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..fab1933e3d 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,24 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +865,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +874,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#14

Zhihong Yu

zyu@yugabyte.com

over 4 years ago

In reply to: Denis Hirn (#13)

Re: [PATCH] Allow multiple recursive self-references

On Tue, Aug 17, 2021 at 5:58 AM Denis Hirn <denis.hirn@uni-tuebingen.de>
wrote:

The tests fail when you build with assertions enabled (configure

--enable-cassert).

Thank you for pointing that out. The new version of this patch fixes that.
The tests are working properly now. All style related issues are fixed as
well.

Best wishes,
-- Denis

Hi,

+                   selfrefcountL = cstate->selfrefcount;
+                   cstate->selfrefcount = selfrefcount;

Maybe the variable selfrefcountL can be renamed slightly (e.g.
curr_selfrefcount) so that the code is easier to read.

Cheers

#15

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Zhihong Yu (#14)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

Maybe the variable selfrefcountL can be renamed slightly (e.g. curr_selfrefcount) so that the code is easier to read.

Yes, you are absolutely right. Thanks for the suggestion.
The new version renames this variable.

Best wishes,
-- Denis

Attachments:

0006-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0006-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 893304d9da58f49a467e7cc233a53934683d3494 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 doc/src/sgml/queries.sgml                | 38 +++++++++++++++++
 src/backend/executor/nodeWorktablescan.c | 21 +++++++--
 src/backend/parser/parse_cte.c           | 26 ++++++++++--
 src/include/nodes/execnodes.h            |  1 +
 src/test/regress/expected/with.out       | 54 +++++++++++++++++++++---
 src/test/regress/sql/with.sql            | 22 +++++++++-
 6 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..2a01d43f9b 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+ <sect3 id="queries-recursive-terms">
+  <title>Multiple Recursive Terms</title>
+
+  <para>
+   The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+   query may contain exactly one self-reference. Multiple such self-references
+   are only allowed if the <literal>WITH</literal> query features multiple recursive
+   terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+   Consider the example query below which computes the transitive closure of trave connections
+   from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+   <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+   Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+   self-references to <literal>connections</literal> to find viable onward jorneys,
+   in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     (SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city')
+  UNION ALL -- two recursive terms below
+     (SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source)
+)
+SELECT *  FROM tc;
+</programlisting>
+  </para>
+ </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..c73d23d1de 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,14 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
+		/*
+		 * Allocate a unique read pointer for each worktable scan.
+		 */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
+		tuplestore_rescan(node->rustate->working_table);
 
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +151,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll set this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -219,5 +224,13 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
 	if (node->rustate)
+	{
+		/* Make sure to select the initial read pointer */
+		tuplestore_select_read_pointer(node->rustate->working_table, 0);
+		tuplestore_rescan(node->rustate->working_table);
+		/* Reallocate a unique read pointer. */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..39f97f8ceb 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -1117,9 +1117,29 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* check selfrefcount for each recursive member individually */
+				if (stmt->larg != NULL && stmt->rarg != NULL)
+				{
+					int curr_selfrefcount;
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					curr_selfrefcount = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if (cstate->selfrefcount < curr_selfrefcount)
+						cstate->selfrefcount = curr_selfrefcount;
+				}
+				else
+				{
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 37cb4f3d59..b53844437c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..8f83582d7b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,48 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1837,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..fab1933e3d 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,24 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +865,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +874,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#16

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Denis Hirn (#15)

Re: [PATCH] Allow multiple recursive self-references

I tested the following query (from SQLite documentation):

CREATE TABLE edge(aa INT, bb INT);

WITH RECURSIVE nodes(x) AS (
SELECT 59
UNION
SELECT aa FROM edge JOIN nodes ON bb=x
UNION
SELECT bb FROM edge JOIN nodes ON aa=x
)
SELECT x FROM nodes;

ERROR: 42P19: recursive reference to query "nodes" must not appear
within its non-recursive term
LINE 4: SELECT aa FROM edge JOIN nodes ON bb=x
^
LOCATION: checkWellFormedRecursionWalker, parse_cte.c:960

This well-formedness check apparently needs to be enhanced to allow for
more than two branches in the union.

#17

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Peter Eisentraut (#16)

Re: [PATCH] Allow multiple recursive self-references

This well-formedness check apparently needs to be enhanced to allow for more than two branches in the union.

The UNION operators' left associativity causes this problem. Currently, the recursive term
must be enclosed in parentheses to make this example work:

WITH RECURSIVE nodes(x) AS (
SELECT 59
UNION
(SELECT aa FROM edge JOIN nodes ON bb=x
UNION
SELECT bb FROM edge JOIN nodes ON aa=x)
)
SELECT x FROM nodes;

The current well-formedness check assumes the left argument of the top most UNION [ALL]
to be the non-recursive term. This allows for arbitrarily many non-recursive terms, and
exactly one recursive term. This should not be changed because later stages expect this
structure. But this left-deep UNION [ALL] tree does not suffice anymore. Therefore, the
ctequery field of the CommonTableExpr must be rewritten –– I think.

Let's take a look at another example:

WITH RECURSIVE nodes(x) AS (
SELECT 59
UNION
SELECT 42
UNION
SELECT aa FROM edge JOIN nodes ON bb=x
UNION -- Top most UNION operator
SELECT bb FROM edge JOIN nodes ON aa=x
)
SELECT x FROM nodes;

This would be parsed left-deep as:
((SELECT 59 UNION SELECT 42) UNION SELECT aa ...) UNION SELECT bb ...

The proposed rewriting should be able to detect that (SELECT 59 UNION SELECT 42) does
not contain any recursive references and therefore pull the term upwards in the tree,
leaving us with:

(SELECT 59 UNION SELECT 42) UNION (SELECT aa ... UNION SELECT bb ...)

Which would be the equivalent of:

WITH RECURSIVE nodes(x) AS (
(SELECT 59
UNION
SELECT 42)
UNION -- Top most UNION operator
(SELECT aa FROM edge JOIN nodes ON bb=x
UNION
SELECT bb FROM edge JOIN nodes ON aa=x)
)
SELECT x FROM nodes;

I am not sure if this patch should introduce such a rewriting.
Any thoughts on this?

Best,
–– Denis

#18

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Denis Hirn (#17)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

I am not sure if this patch should introduce such a rewriting.

I have thought about this again. I think it is reasonable that this patch introduces
such a rewriting.

This well-formedness check apparently needs to be enhanced to allow for more than two branches in the union.

The new version of this patch contains the necessary rewriting. The example from the SQLite documentation
can now be executed without explicit parentheses.

Best,
–– Denis

Attachments:

0007-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0007-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 7abb3580bde6710be275b5fa2b861aae4f334723 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 doc/src/sgml/queries.sgml                | 38 +++++++++++++++
 src/backend/executor/nodeWorktablescan.c | 21 +++++++--
 src/backend/parser/parse_cte.c           | 60 ++++++++++++++++++++----
 src/include/nodes/execnodes.h            |  1 +
 src/test/regress/expected/with.out       | 54 ++++++++++++++++++---
 src/test/regress/sql/with.sql            | 22 ++++++++-
 6 files changed, 176 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..2a01d43f9b 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+ <sect3 id="queries-recursive-terms">
+  <title>Multiple Recursive Terms</title>
+
+  <para>
+   The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+   query may contain exactly one self-reference. Multiple such self-references
+   are only allowed if the <literal>WITH</literal> query features multiple recursive
+   terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+   Consider the example query below which computes the transitive closure of trave connections
+   from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+   <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+   Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+   self-references to <literal>connections</literal> to find viable onward jorneys,
+   in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     (SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city')
+  UNION ALL -- two recursive terms below
+     (SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source)
+)
+SELECT *  FROM tc;
+</programlisting>
+  </para>
+ </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..c73d23d1de 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,14 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
+		/*
+		 * Allocate a unique read pointer for each worktable scan.
+		 */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
+		tuplestore_rescan(node->rustate->working_table);
 
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +151,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll set this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -219,5 +224,13 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
 	if (node->rustate)
+	{
+		/* Make sure to select the initial read pointer */
+		tuplestore_select_read_pointer(node->rustate->working_table, 0);
+		tuplestore_rescan(node->rustate->working_table);
+		/* Reallocate a unique read pointer. */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..a2a496472a 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -80,6 +80,8 @@ typedef struct CteState
 	/* working state for checkWellFormedRecursion walk only: */
 	int			selfrefcount;	/* number of self-references detected */
 	RecursionContext context;	/* context to allow or disallow self-ref */
+	bool		root_is_union;	/* root of non-recursive term is SETOP_UNION */
+	bool		rotate;			/* self-reference in non-recursive term detected */
 } CteState;
 
 
@@ -853,11 +855,27 @@ checkWellFormedRecursion(CteState *cstate)
 					 parser_errposition(cstate->pstate, cte->location)));
 
 		/* The left-hand operand mustn't contain self-reference at all */
-		cstate->curitem = i;
-		cstate->innerwiths = NIL;
-		cstate->selfrefcount = 0;
-		cstate->context = RECURSION_NONRECURSIVETERM;
-		checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+		do {
+			cstate->curitem = i;
+			cstate->innerwiths = NIL;
+			cstate->selfrefcount = 0;
+			cstate->context = RECURSION_NONRECURSIVETERM;
+			cstate->root_is_union = stmt->larg->op == SETOP_UNION;
+			cstate->rotate = false;
+			checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+			/* Check if rotate flag has been set. */
+			if (cstate->root_is_union && cstate->rotate)
+			{
+				/* Rotate stmt UNION tree to the right. */
+				SelectStmt *rarg = stmt->rarg;
+				stmt->rarg = stmt->larg;
+				stmt->larg = stmt->larg->larg;
+				stmt->rarg->larg = stmt->rarg->rarg;
+				stmt->rarg->rarg = rarg;
+			}
+		} while (cstate->root_is_union && cstate->rotate);
+
 		Assert(cstate->innerwiths == NIL);
 
 		/* Right-hand operand should contain one reference in a valid place */
@@ -955,6 +973,12 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate)
 			mycte = cstate->items[cstate->curitem].cte;
 			if (strcmp(rv->relname, mycte->ctename) == 0)
 			{
+				/* Found a recursive reference, but it might be fixable */
+				if (cstate->context == RECURSION_NONRECURSIVETERM && cstate->root_is_union)
+				{
+					cstate->rotate = true;
+					return false;
+				}
 				/* Found a recursive reference to the active query */
 				if (cstate->context != RECURSION_OK)
 					ereport(ERROR,
@@ -1117,9 +1141,29 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* check selfrefcount for each recursive member individually */
+				if (stmt->larg != NULL && stmt->rarg != NULL)
+				{
+					int curr_selfrefcount;
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					curr_selfrefcount = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if (cstate->selfrefcount < curr_selfrefcount)
+						cstate->selfrefcount = curr_selfrefcount;
+				}
+				else
+				{
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 37cb4f3d59..b53844437c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..8f83582d7b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,48 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1837,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..fab1933e3d 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,24 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +865,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +874,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#19

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Denis Hirn (#18)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

The documentation was not up to date anymore with the most recent changes.
This version of the patch fixes that.

Best,
–– Denis

Attachments:

0008-Allow-multiple-recursive-self-references.patchapplication/octet-stream; name=0008-Allow-multiple-recursive-self-references.patch; x-unix-mode=0644Download

From 090ddaeaa87313ea90c22da43cbe34ac0bee4a12 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Allow multiple linear recursive self-references

---
 doc/src/sgml/queries.sgml                | 38 +++++++++++++++
 src/backend/executor/nodeWorktablescan.c | 21 +++++++--
 src/backend/parser/parse_cte.c           | 60 ++++++++++++++++++++----
 src/include/nodes/execnodes.h            |  1 +
 src/test/regress/expected/with.out       | 54 ++++++++++++++++++---
 src/test/regress/sql/with.sql            | 22 ++++++++-
 6 files changed, 176 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..7b97604c85 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+ <sect3 id="queries-recursive-terms">
+  <title>Multiple Recursive Terms</title>
+
+  <para>
+   The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+   query may contain exactly one self-reference. Multiple such self-references
+   are only allowed if the <literal>WITH</literal> query features multiple recursive
+   terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+   Consider the example query below which computes the transitive closure of travel connections
+   from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+   <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+   Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+   self-references to <literal>connections</literal> to find viable onward jorneys,
+   in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city'
+  UNION ALL -- two recursive terms below
+     SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source
+)
+SELECT *  FROM tc;
+</programlisting>
+  </para>
+ </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..c73d23d1de 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,14 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
+		/*
+		 * Allocate a unique read pointer for each worktable scan.
+		 */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
+		tuplestore_rescan(node->rustate->working_table);
 
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +151,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll set this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -219,5 +224,13 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
 	if (node->rustate)
+	{
+		/* Make sure to select the initial read pointer */
+		tuplestore_select_read_pointer(node->rustate->working_table, 0);
+		tuplestore_rescan(node->rustate->working_table);
+		/* Reallocate a unique read pointer. */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..a2a496472a 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -80,6 +80,8 @@ typedef struct CteState
 	/* working state for checkWellFormedRecursion walk only: */
 	int			selfrefcount;	/* number of self-references detected */
 	RecursionContext context;	/* context to allow or disallow self-ref */
+	bool		root_is_union;	/* root of non-recursive term is SETOP_UNION */
+	bool		rotate;			/* self-reference in non-recursive term detected */
 } CteState;
 
 
@@ -853,11 +855,27 @@ checkWellFormedRecursion(CteState *cstate)
 					 parser_errposition(cstate->pstate, cte->location)));
 
 		/* The left-hand operand mustn't contain self-reference at all */
-		cstate->curitem = i;
-		cstate->innerwiths = NIL;
-		cstate->selfrefcount = 0;
-		cstate->context = RECURSION_NONRECURSIVETERM;
-		checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+		do {
+			cstate->curitem = i;
+			cstate->innerwiths = NIL;
+			cstate->selfrefcount = 0;
+			cstate->context = RECURSION_NONRECURSIVETERM;
+			cstate->root_is_union = stmt->larg->op == SETOP_UNION;
+			cstate->rotate = false;
+			checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+			/* Check if rotate flag has been set. */
+			if (cstate->root_is_union && cstate->rotate)
+			{
+				/* Rotate stmt UNION tree to the right. */
+				SelectStmt *rarg = stmt->rarg;
+				stmt->rarg = stmt->larg;
+				stmt->larg = stmt->larg->larg;
+				stmt->rarg->larg = stmt->rarg->rarg;
+				stmt->rarg->rarg = rarg;
+			}
+		} while (cstate->root_is_union && cstate->rotate);
+
 		Assert(cstate->innerwiths == NIL);
 
 		/* Right-hand operand should contain one reference in a valid place */
@@ -955,6 +973,12 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate)
 			mycte = cstate->items[cstate->curitem].cte;
 			if (strcmp(rv->relname, mycte->ctename) == 0)
 			{
+				/* Found a recursive reference, but it might be fixable */
+				if (cstate->context == RECURSION_NONRECURSIVETERM && cstate->root_is_union)
+				{
+					cstate->rotate = true;
+					return false;
+				}
 				/* Found a recursive reference to the active query */
 				if (cstate->context != RECURSION_OK)
 					ereport(ERROR,
@@ -1117,9 +1141,29 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* check selfrefcount for each recursive member individually */
+				if (stmt->larg != NULL && stmt->rarg != NULL)
+				{
+					int curr_selfrefcount;
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					curr_selfrefcount = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if (cstate->selfrefcount < curr_selfrefcount)
+						cstate->selfrefcount = curr_selfrefcount;
+				}
+				else
+				{
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 37cb4f3d59..b53844437c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..8f83582d7b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,48 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1837,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..fab1933e3d 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,24 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +865,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +874,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#20

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Denis Hirn (#19)

Re: [PATCH] Allow multiple recursive self-references

On 31.08.21 16:48, Denis Hirn wrote:

The documentation was not up to date anymore with the most recent changes.
This version of the patch fixes that.

I studied up on the theory and terminology being discussed here. I
conclude that what the latest version of this patch is doing (allowing
multiple recursive references, but only in a linear-recursion way) is
sound and useful.

I haven't looked closely at the implementation changes yet. I'm not
very familiar with that part of the code, so it will take a bit longer.
Perhaps you could explain what you are doing there, either in email or
(even better) in the commit message.

What I think this patch needs is a lot more test cases. I would like to
see more variants of different nestings of the UNION branches, some
mixtures of UNION ALL and UNION DISTINCT, joins of the recursive CTE
with regular tables (like in the flights-and-trains examples), as well
as various cases of what is not allowed, namely showing that it can
carefully prohibit non-linear recursion. Also, in one instance you
modify an existing test case. I'm not sure why. Perhaps it would be
better to leave the existing test case alone and write a new one.

You also introduce this concept of reshuffling the UNION branches to
collect all the recursive terms under one subtree. That could use more
testing, like combinations like nonrec UNION rec UNION nonrec UNION rec
etc. Also, currently a query like this works

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

but this doesn't:

WITH RECURSIVE t(n) AS (
SELECT n+1 FROM t WHERE n < 100
UNION ALL
VALUES (1)
)
SELECT sum(n) FROM t;

With your patch, the second should also work, so let's show some tests
for that as well.

#21

Denis Hirn

denis.hirn@uni-tuebingen.de

over 4 years ago

In reply to: Peter Eisentraut (#20)

1 attachment(s)

Re: [PATCH] Allow multiple recursive self-references

I studied up on the theory and terminology being discussed here. I conclude that what the latest version of this patch is doing (allowing multiple recursive references, but only in a linear-recursion way) is sound and useful.

That's great to hear!

I haven't looked closely at the implementation changes yet. I'm not very familiar with that part of the code, so it will take a bit longer. Perhaps you could explain what you are doing there, either in email or (even better) in the commit message.

I have extended the commit message. Some more discussion (regarding tree rotation etc.) can be found
in this mail, but also in the commit message.

What I think this patch needs is a lot more test cases. I would like to see more variants of different nestings of the UNION branches, some mixtures of UNION ALL and UNION DISTINCT, joins of the recursive CTE with regular tables (like in the flights-and-trains examples)

You are right, the testing is a bit sparse at the moment. I've added some more
test cases with the new version of this patch. Also I've improved the comments.

as well as various cases of what is not allowed, namely showing that it can carefully prohibit non-linear recursion.

The regression tests already feature tests against non-linear recursion.
Therefore I did not add more myself.

Also, in one instance you modify an existing test case. I'm not sure why. Perhaps it would be better to leave the existing test case alone and write a new one.

I agree that it would be better not to modify the existing test case, but the
modification is unavoidable. Here are the original queries:

-- non-linear recursion is not allowed
WITH RECURSIVE foo(i) AS
(values (1)
UNION ALL
(SELECT i+1 FROM foo WHERE i < 10
UNION ALL
SELECT i+1 FROM foo WHERE i < 5)
) SELECT * FROM foo;

WITH RECURSIVE foo(i) AS
(values (1)
UNION ALL
SELECT * FROM
(SELECT i+1 FROM foo WHERE i < 10
UNION ALL
SELECT i+1 FROM foo WHERE i < 5) AS t
) SELECT * FROM foo;

These two test cases are supposed to trigger the non-linear recursion check,
and they do without this patch. However, with the modifications this patch
introduces, both queries are now valid input. That's because each recursive
reference is placed inside a separate recursive UNION branch. This means that
both queries are linear recursive, and not non-linear recursive as they should be.

To make these tests check for non-linear recursion again, at least one
UNION branch must contain more than one recursive reference. That's the
modification I did.

You also introduce this concept of reshuffling the UNION branches to collect all the recursive terms under one subtree.

Yes, but the reshuffling is only applied in a very specific situation. Example:

UNION ---> UNION
/ \ / \
UNION C A UNION
/ \ / \
A B B C

A, B, and C are arbitrary SelectStmt nodes and can themselfes be deeper nested
UNION nodes.

A is a non-recursive term in the WITH RECURSIVE query. B, and C both contain a
recursive self-reference. The planner expects the top UNION node to contain
the non-recursive term in the larg, and the recursive term in the rarg.
Therefore, the tree shape on the left is invalid and would result in an error
message at the parsing stage. However, by rotating the tree to the right, this
problem can be solved so that the valid tree shape on the right side is created.

All this does, really, is to re-parenthesize the expression:
(A UNION B) UNION C ---> A UNION (B UNION C)

Also, currently a query like this works [...] but this doesn't:

WITH RECURSIVE t(n) AS (
SELECT n+1 FROM t WHERE n < 100
UNION ALL
VALUES (1)
)
SELECT sum(n) FROM t;

With your patch, the second should also work, so let's show some tests for that as well.

With just the tree rotation, the second query can not be fixed. The order of two
nodes is never changed. And I think that this is a good thing. Consider the
following query:

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
UNION ALL
VALUES (2)
) SELECT * FROM t LIMIT 100;

If we'd just collect all non-recursive UNION branches, the semantics of the
second query would change. But changing the semantics of a query (or preventing
certain queries to be formulated at all) is not something I think this patch
should do. Therfore – I think – it's appropriate that the second query fails.

Best,
-- Denis

Attachments:

0009-Add-SQL-standard-multiple-linear-self-references-in-.patchapplication/octet-stream; name=0009-Add-SQL-standard-multiple-linear-self-references-in-.patch; x-unix-mode=0644Download

From b104b8562adbd19c785a5c2d5140931f844b1a55 Mon Sep 17 00:00:00 2001
From: Denis Hirn <denis.hirn@uni-tuebingen.de>
Date: Tue, 23 Mar 2021 12:58:36 +0100
Subject: [PATCH] Add SQL-standard multiple linear self-references in WITH
 RECURSIVE

The recursive term of a WITH RECURSIVE query must contain exactly one
self-reference. Multiple such self-references are only allowed if the WITH query
features multiple recursive terms, all of which are joined by UNION [ALL].

Without using explicit parentheses to change the precedence of the parsing,
a left-deep UNION [ALL] tree is created. This allows for multiple non-recursive
terms, and exactly one recursive term. To allow multiple non-recursive terms,
as well as multiple recursive terms, a different grouping of terms is required.
To fix this, tree rotation is added to checkWellFormedRecursion.

Example:
A, B, and C are arbitrary SelectStmt nodes and can be deeper nested UNION nodes.

A is a non-recursive term in the WITH RECURSIVE query. B, and C both contain a
recursive self-reference. The planner expects the top UNION node to contain
the non-recursive term in the larg, and the recursive term in the rarg.
Therefore, the tree shape on the left is invalid and would result in an error
message at the parsing stage. However, by rotating the tree to the right, this
problem can be solved so that the valid tree shape on the right side is created.

      UNION   --->   UNION
     /     \        /     \
   UNION    C      A    UNION
  /     \              /     \
 A       B            B       C

Effectively this re-parenthesizes the expression:
(A UNION B) UNION C ---> A UNION (B UNION C)
---
 doc/src/sgml/queries.sgml                |  38 +++++
 src/backend/executor/nodeWorktablescan.c |  21 ++-
 src/backend/parser/parse_cte.c           |  84 +++++++++-
 src/include/nodes/execnodes.h            |   1 +
 src/test/regress/expected/with.out       | 205 ++++++++++++++++++++++-
 src/test/regress/sql/with.sql            | 121 ++++++++++++-
 6 files changed, 450 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 834b83b509..fcc9131006 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2172,6 +2172,44 @@ GROUP BY sub_part
 </programlisting>
   </para>
 
+ <sect3 id="queries-recursive-terms">
+  <title>Multiple Recursive Terms</title>
+
+  <para>
+   The <firstterm>recursive term</firstterm> of a recursive <literal>WITH</literal>
+   query may contain exactly one self-reference. Multiple such self-references
+   are only allowed if the <literal>WITH</literal> query features multiple recursive
+   terms, all of which are joined by <literal>UNION</literal> (or <literal>UNION ALL</literal>).
+   Consider the example query below which computes the transitive closure of travel connections
+   from <literal>'my_city'</literal>. Flight and train routes are found in separate tables
+   <literal>flights(source,dest,carrier)</literal> and <literal>trains(source,dest)</literal>.
+   Two recursive terms, joined by <literal>UNION ALL</literal>, use two recursive
+   self-references to <literal>connections</literal> to find viable onward jorneys,
+   in both the <literal>flights</literal> and <literal>trains</literal> tables.
+
+<programlisting>
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'my_city'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'my_city'
+  UNION ALL -- two recursive terms below
+     SELECT c.source, f.dest, f.carrier
+      FROM <emphasis>connections c</emphasis>, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM <emphasis>connections c</emphasis>, trains r
+      WHERE c.dest = r.source
+)
+SELECT * FROM connections;
+</programlisting>
+  </para>
+ </sect3>
+
   <sect3 id="queries-with-search">
    <title>Search Order</title>
 
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 91d3bf376b..c73d23d1de 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -42,10 +42,6 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * worktable plan node, since it cannot appear high enough in the plan
 	 * tree of a scrollable cursor to be exposed to a backward-scan
 	 * requirement.  So it's not worth expending effort to support it.
-	 *
-	 * Note: we are also assuming that this node is the only reader of the
-	 * worktable.  Therefore, we don't need a private read pointer for the
-	 * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.
 	 */
 	Assert(ScanDirectionIsForward(node->ss.ps.state->es_direction));
 
@@ -55,6 +51,7 @@ WorkTableScanNext(WorkTableScanState *node)
 	 * Get the next tuple from tuplestore. Return NULL if no more tuples.
 	 */
 	slot = node->ss.ss_ScanTupleSlot;
+	tuplestore_select_read_pointer(tuplestorestate, node->readptr);
 	(void) tuplestore_gettupleslot(tuplestorestate, true, false, slot);
 	return slot;
 }
@@ -99,7 +96,14 @@ ExecWorkTableScan(PlanState *pstate)
 		Assert(!param->isnull);
 		node->rustate = castNode(RecursiveUnionState, DatumGetPointer(param->value));
 		Assert(node->rustate);
+		/*
+		 * Allocate a unique read pointer for each worktable scan.
+		 */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
+		tuplestore_rescan(node->rustate->working_table);
 
+		Assert(node->readptr != -1);
 		/*
 		 * The scan tuple type (ie, the rowtype we expect to find in the work
 		 * table) is the same as the result rowtype of the ancestor
@@ -147,6 +151,7 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.plan = (Plan *) node;
 	scanstate->ss.ps.state = estate;
 	scanstate->ss.ps.ExecProcNode = ExecWorkTableScan;
+	scanstate->readptr = -1;	/* we'll set this later */
 	scanstate->rustate = NULL;	/* we'll set this later */
 
 	/*
@@ -219,5 +224,13 @@ ExecReScanWorkTableScan(WorkTableScanState *node)
 
 	/* No need (or way) to rescan if ExecWorkTableScan not called yet */
 	if (node->rustate)
+	{
+		/* Make sure to select the initial read pointer */
+		tuplestore_select_read_pointer(node->rustate->working_table, 0);
+		tuplestore_rescan(node->rustate->working_table);
+		/* Reallocate a unique read pointer. */
+		node->readptr = tuplestore_alloc_read_pointer(node->rustate->working_table, 0);
+		tuplestore_copy_read_pointer(node->rustate->working_table, 0, node->readptr);
 		tuplestore_rescan(node->rustate->working_table);
+	}
 }
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index f6ae96333a..b78a630367 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -80,6 +80,8 @@ typedef struct CteState
 	/* working state for checkWellFormedRecursion walk only: */
 	int			selfrefcount;	/* number of self-references detected */
 	RecursionContext context;	/* context to allow or disallow self-ref */
+	bool		root_is_union;	/* root of non-recursive term is SETOP_UNION */
+	bool		rotate;			/* self-reference in non-recursive term detected */
 } CteState;
 
 
@@ -853,11 +855,51 @@ checkWellFormedRecursion(CteState *cstate)
 					 parser_errposition(cstate->pstate, cte->location)));
 
 		/* The left-hand operand mustn't contain self-reference at all */
-		cstate->curitem = i;
-		cstate->innerwiths = NIL;
-		cstate->selfrefcount = 0;
-		cstate->context = RECURSION_NONRECURSIVETERM;
-		checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+		do {
+			cstate->curitem = i;
+			cstate->innerwiths = NIL;
+			cstate->selfrefcount = 0;
+			cstate->context = RECURSION_NONRECURSIVETERM;
+			cstate->root_is_union = stmt->larg->op == SETOP_UNION;
+			cstate->rotate = false;
+			checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+			/* The well formed recursion check failed and might have set the
+			 * rotate flag.
+			 */
+			if (cstate->root_is_union && cstate->rotate)
+			{
+				/* By default, the parser creates a left-deep UNION [ALL] tree.
+				 * Usage of multiple recursive terms requires a different
+				 * grouping of the terms. We have to perform a tree rotation to
+				 * the right to fix the structure. Example:
+				 *
+				 * A is a non-recursive term. B, and C both contain a recursive
+				 * reference.
+				 *
+				 *      UNION   --->   UNION
+				 *     /     \        /     \
+				 *   UNION    C      A    UNION
+				 *  /     \              /     \
+				 * A       B            B       C
+				 *
+				 * NOTE: A, B, and C are arbitrary SelectStmt nodes.
+				 */
+				SelectStmt *rarg = stmt->rarg;
+				bool all = stmt->larg->all;
+
+				stmt->rarg = stmt->larg;
+				stmt->larg = stmt->larg->larg;
+				stmt->rarg->larg = stmt->rarg->rarg;
+				stmt->rarg->rarg = rarg;
+				/*
+				 * Make sure that the UNION [ALL] flag is set correctly.
+				 */
+				stmt->rarg->all = stmt->all;
+				stmt->all = all;
+			}
+		} while (cstate->root_is_union && cstate->rotate);
+
 		Assert(cstate->innerwiths == NIL);
 
 		/* Right-hand operand should contain one reference in a valid place */
@@ -955,6 +997,12 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate)
 			mycte = cstate->items[cstate->curitem].cte;
 			if (strcmp(rv->relname, mycte->ctename) == 0)
 			{
+				/* Found a recursive reference, but it might be fixable */
+				if (cstate->context == RECURSION_NONRECURSIVETERM && cstate->root_is_union)
+				{
+					cstate->rotate = true;
+					return false;
+				}
 				/* Found a recursive reference to the active query */
 				if (cstate->context != RECURSION_OK)
 					ereport(ERROR,
@@ -1117,9 +1165,29 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate)
 		{
 			case SETOP_NONE:
 			case SETOP_UNION:
-				raw_expression_tree_walker((Node *) stmt,
-										   checkWellFormedRecursionWalker,
-										   (void *) cstate);
+				/* check selfrefcount for each recursive member individually */
+				if (stmt->larg != NULL && stmt->rarg != NULL)
+				{
+					int curr_selfrefcount;
+					int selfrefcount = cstate->selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->larg, cstate);
+
+					/* Restore selfrefcount to allow multiple linear recursive references */
+					curr_selfrefcount = cstate->selfrefcount;
+					cstate->selfrefcount = selfrefcount;
+
+					checkWellFormedRecursionWalker((Node *) stmt->rarg, cstate);
+					/* Recursive anchors can contain recursive references, but don't have to. */
+					if (cstate->selfrefcount < curr_selfrefcount)
+						cstate->selfrefcount = curr_selfrefcount;
+				}
+				else
+				{
+					raw_expression_tree_walker((Node *) stmt,
+											   checkWellFormedRecursionWalker,
+											   (void *) cstate);
+				}
 				break;
 			case SETOP_INTERSECT:
 				if (stmt->all)
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 37cb4f3d59..b53844437c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1843,6 +1843,7 @@ typedef struct NamedTuplestoreScanState
 typedef struct WorkTableScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
+	int		readptr;			/* index of my tuplestore read pointer */
 	RecursiveUnionState *rustate;
 } WorkTableScanState;
 
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 3523a7dcc1..008106b5ea 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -235,6 +235,199 @@ WITH RECURSIVE outermost(x) AS (
  7
 (7 rows)
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+-- No explicit parentheses
+WITH RECURSIVE foo(i) AS
+    (values (1)
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 5
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+ 2
+ 3
+ 3
+ 4
+ 4
+ 5
+ 5
+(9 rows)
+
+CREATE TEMP TABLE flights (
+  source  TEXT,
+  dest    TEXT,
+  carrier TEXT
+);
+INSERT INTO flights VALUES
+('A', 'B', 'C1'),
+('A', 'C', 'C2'),
+('A', 'D', 'C1'),
+('B', 'D', 'C3'),
+('C', 'E', 'C3')
+;
+CREATE TEMP TABLE trains (
+  source TEXT,
+  dest   TEXT
+);
+INSERT INTO trains VALUES
+('B', 'C'),
+('A', 'E'),
+('C', 'E')
+;
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'A'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'A'
+  UNION ALL -- two recursive terms below
+     SELECT c.source, f.dest, f.carrier
+      FROM connections c, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM connections c, trains r
+      WHERE c.dest = r.source
+)
+SELECT * FROM connections;
+ source | dest | carrier 
+--------+------+---------
+ A      | B    | C1
+ A      | C    | C2
+ A      | D    | C1
+ A      | E    | Rail
+ A      | D    | C3
+ A      | E    | C3
+ A      | C    | Rail
+ A      | E    | Rail
+ A      | E    | C3
+ A      | E    | Rail
+(10 rows)
+
+-- Test mixed UNION and UNION ALL with multiple recursive anchors
+WITH RECURSIVE t(x) AS
+(
+  SELECT 2
+    UNION
+  SELECT 1
+    UNION ALL
+  SELECT x+1
+  FROM   t
+  WHERE  x < 4
+    UNION
+  SELECT x*2
+  FROM   t
+  WHERE  x >= 4 AND x < 8
+    UNION ALL
+  SELECT x+1
+  FROM   t
+  WHERE  x >= 4 AND x < 8
+) SELECT * FROM t;
+ x  
+----
+  1
+  2
+  2
+  3
+  4
+  3
+  5
+  4
+  8
+  8
+ 10
+  5
+  6
+  7
+ 10
+  6
+ 12
+  8
+  7
+ 12
+ 14
+ 14
+  8
+(23 rows)
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+) SELECT * FROM foo;
+ i 
+---
+ 1
+ 2
+(2 rows)
+
 --
 -- Some examples with a tree
 --
@@ -1795,24 +1988,24 @@ LINE 2:   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y WHERE id ...
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 6:        SELECT i+1 FROM foo WHERE i < 5)
-                               ^
+LINE 4:        (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
 ERROR:  recursive reference to query "foo" must not appear more than once
-LINE 7:        SELECT i+1 FROM foo WHERE i < 5) AS t
-                               ^
+LINE 5:        (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 1...
+                                              ^
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 8b213ee408..235726e15b 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -139,6 +139,123 @@ WITH RECURSIVE outermost(x) AS (
  )
  SELECT * FROM outermost ORDER BY 1;
 
+-- Test multiple self-references in different recursive anchors
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2)
+) SELECT * FROM foo;
+
+-- No explicit parentheses
+WITH RECURSIVE foo(i) AS
+    (values (1)
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 5
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+) SELECT * FROM foo;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+    UNION ALL
+	   SELECT * FROM
+       (SELECT i+1 FROM foo WHERE i < 5
+          UNION ALL
+       SELECT i+1 FROM foo WHERE i < 2) AS t
+) SELECT * FROM foo;
+
+CREATE TEMP TABLE flights (
+  source  TEXT,
+  dest    TEXT,
+  carrier TEXT
+);
+
+INSERT INTO flights VALUES
+('A', 'B', 'C1'),
+('A', 'C', 'C2'),
+('A', 'D', 'C1'),
+('B', 'D', 'C3'),
+('C', 'E', 'C3')
+;
+
+CREATE TEMP TABLE trains (
+  source TEXT,
+  dest   TEXT
+);
+
+INSERT INTO trains VALUES
+('B', 'C'),
+('A', 'E'),
+('C', 'E')
+;
+
+WITH RECURSIVE connections(source, dest, carrier) AS (
+     SELECT f.source, f.dest, f.carrier
+      FROM flights f
+      WHERE f.source = 'A'
+    UNION ALL
+      SELECT r.source, r.dest, 'Rail' AS carrier
+      FROM trains r
+      WHERE r.source = 'A'
+  UNION ALL -- two recursive terms below
+     SELECT c.source, f.dest, f.carrier
+      FROM connections c, flights f
+      WHERE c.dest = f.source
+    UNION ALL
+      SELECT c.source, r.dest, 'Rail' AS carrier
+      FROM connections c, trains r
+      WHERE c.dest = r.source
+)
+SELECT * FROM connections;
+
+-- Test mixed UNION and UNION ALL with multiple recursive anchors
+WITH RECURSIVE t(x) AS
+(
+  SELECT 2
+    UNION
+  SELECT 1
+    UNION ALL
+  SELECT x+1
+  FROM   t
+  WHERE  x < 4
+    UNION
+  SELECT x*2
+  FROM   t
+  WHERE  x >= 4 AND x < 8
+    UNION ALL
+  SELECT x+1
+  FROM   t
+  WHERE  x >= 4 AND x < 8
+) SELECT * FROM t;
+
+WITH RECURSIVE foo(i) AS
+    (values (1)
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION ALL
+    SELECT i+1 FROM foo WHERE i < 2
+      UNION
+    SELECT i+1 FROM foo WHERE i < 2
+) SELECT * FROM foo;
+
 --
 -- Some examples with a tree
 --
@@ -847,7 +964,7 @@ SELECT * FROM x;
 WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo AS y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5)
 ) SELECT * FROM foo;
@@ -856,7 +973,7 @@ WITH RECURSIVE foo(i) AS
     (values (1)
     UNION ALL
 	   SELECT * FROM
-       (SELECT i+1 FROM foo WHERE i < 10
+       (SELECT x.i+y.i FROM foo AS x, foo as y WHERE x.i < 10
           UNION ALL
        SELECT i+1 FROM foo WHERE i < 5) AS t
 ) SELECT * FROM foo;
-- 
2.32.0

#22

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Denis Hirn (#21)

Re: [PATCH] Allow multiple recursive self-references

On 21.09.21 13:35, Denis Hirn wrote:

Also, currently a query like this works [...] but this doesn't:

WITH RECURSIVE t(n) AS (
SELECT n+1 FROM t WHERE n < 100
UNION ALL
VALUES (1)
)
SELECT sum(n) FROM t;

With your patch, the second should also work, so let's show some tests for that as well.

With just the tree rotation, the second query can not be fixed. The order of two
nodes is never changed. And I think that this is a good thing. Consider the
following query:

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
UNION ALL
VALUES (2)
) SELECT * FROM t LIMIT 100;

If we'd just collect all non-recursive UNION branches, the semantics of the
second query would change. But changing the semantics of a query (or preventing
certain queries to be formulated at all) is not something I think this patch
should do. Therfore – I think – it's appropriate that the second query fails.

I have been studying this a bit more. I don't understand your argument
here. Why would this query have different semantics than, say

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
VALUES (2)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
) SELECT * FROM t LIMIT 100;

The order of UNION branches shouldn't be semantically relevant.

I suppose you put the LIMIT clause in there to make some point, but I
didn't get it. ;-)

I also considered this example:

WITH RECURSIVE t(n) AS (
(VALUES (1) UNION ALL VALUES (2))
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

This works fine without and with your patch.

This should be equivalent:

WITH RECURSIVE t(n) AS (
VALUES (1) UNION ALL (VALUES (2)
UNION ALL
SELECT n+1 FROM t WHERE n < 100)
)
SELECT sum(n) FROM t;

But this runs forever in current PostgreSQL 14 and 15. I'd have
expected your patch to convert this form to the previous form, but it
doesn't.

I'm having difficulties understanding which subset of cases your patch
wants to address.

#23

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Denis Hirn (#21)

Re: [PATCH] Allow multiple recursive self-references

I have some separate questions on the executor changes. Basically, this
seems the right direction, but I wonder if some things could be clarified.

I wonder why in ExecWorkTableScan() and ExecReScanWorkTableScan() you
call tuplestore_copy_read_pointer() instead of just
tuplestore_select_read_pointer(). What is the special role of read
pointer 0 that you are copying. Before your changes, it was just the
implicit read pointer, but now that we have several, it would be good to
explain their relationship.

Also, the comment you deleted says "Therefore, we don't need a private
read pointer for the tuplestore, nor do we need to tell
tuplestore_gettupleslot to copy." You addressed the first part with the
read pointer handling, but what about the second part? The
tuplestore_gettupleslot() call in WorkTableScanNext() still has
copy=false. Is this an oversight, or did you determine that copying is
still not necessary?

#24

Denis Hirn

denis.hirn@uni-tuebingen.de

about 4 years ago

In reply to: Peter Eisentraut (#23)

Re: [PATCH] Allow multiple recursive self-references

I have been studying this a bit more. I don't understand your argument here.
Why would this query have different semantics than, say

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
VALUES (2)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
) SELECT * FROM t LIMIT 100;

The order of UNION branches shouldn't be semantically relevant.

WITH RECURSIVE (ab)uses the (typically associative and commutative) UNION [ALL] clause,
and fundamentally changes the semantics – associativity and commutativity no longer apply.
I think your confusion stems from this ambiguity.

Let me briefly reiterate WITH RECURSIVE. Basically, you always have a query like this:

WITH RECURSIVE w(c1,...) AS (
<non-recursive>
UNION [ALL]
<recursive>
) q;

There must be a non-recursive part that does not refer to w, and -- without
the patch -- exactly one recursive part that refers to w. The non-recursive part,
and the recursive part are combined using UNION [ALL].

However, let's assume a different, unambiguous syntax just to make my point:

WITH RECURSIVE w(c1,...) AS (
<non-recursive>
RUNION [ALL]
<recursive>
) q;

Everything remains the same except that the non-recursive part and the recursive
part are combined using RUNION [ALL] instead of UNION [ALL].

Now let me rephrase your examples using this syntax:

WITH RECURSIVE t(n) AS (
(VALUES (1) UNION ALL VALUES (2))
RUNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

WITH RECURSIVE t(n) AS (
VALUES (1) RUNION ALL (VALUES (2)
UNION ALL
SELECT n+1 FROM t WHERE n < 100)
)
SELECT sum(n) FROM t;

I hope this shows that this is not the same. The first query has two non-recursive
cases and one recursive case. The second query has two recursive cases instead.

The rewrites of those examples:

RUNION RUNION
/ \ good / \
VALUES(1) UNION --> VALUES(1) UNION
/ \ / \
SELECT n+1... VALUES(2) VALUES(2) SELECT n+1...

This rewrite would be valid. The patch would not do that, however.

RUNION RUNION
/ \ bad / \
VALUES(1) UNION --> UNION SELECT n+1...
/ \ / \
SELECT n+1... VALUES(2) VALUES(1) VALUES(2)

This is the rewrite you would expect. But it is not valid, because it changes semantics.
Therefore the patch does not do that.

I'm having difficulties understanding which subset of cases your patch wants to address.

With this patch an extended WITH RECURSIVE syntax is enabled:

WITH RECURSIVE w(c1,...) AS (
<non-recursive>
UNION [ALL]
<recursive 1>
...
UNION [ALL]
<recursive n>
) q;

But really, it is:

WITH RECURSIVE w(c1,...) AS (
<non-recursive 1>
...
<non-recursive n>
UNION [ALL]
<recursive n+1>
...
UNION [ALL]
<recursive m>
) q;

We can have arbitrarily many non-recursive branches (that is possible without the patch),
as well as arbitrarily many recursive UNION [ALL] branches. Problem here: UNION [ALL]
associates to the left. This means that we end up with a left-deep parse tree, that looks
something like this:

RUNION
/ \
... m
/
UNION
/ \
n n+1
/
...
/
UNION
/ \
1 2

That is not correct, because all non-recursive branches must be part of the left
RUNION branch, and all recursive branches must be part of the right one. Postgres
performs this check in parse_cte.c using the checkWellFormedRecursion() function.

Having said the above, let me once again define the rewrite case the patch implements:

RUNION RUNION
/ \ rotate / \
... n+m ---> UNION UNION
/ / \ / \
UNION ... n n+1 UNION
/ \ / / \
n n+1 UNION ... m
/ / \
... 1 2
/
UNION
/ \
1 2

By using tree rotation, the patch transforms the parse tree on the left
into the one on the right. All non-recursive terms 1..n are found in the
left branch of RUNION, all recursive terms n+1..m in the right branch.
This tree now passes the checkWellFormedRecursion() check.
I hope this clarifies things.

The order of UNION branches shouldn't be semantically relevant.

I agree. However, a strict distinction must be made between RUNION and UNION clauses.
These must not be interchanged.

I wonder why in ExecWorkTableScan() and ExecReScanWorkTableScan() you call
tuplestore_copy_read_pointer() instead of just tuplestore_select_read_pointer().

The WorkTableScan reads the working_table tuplestore of the parent RecursiveUnion
operator. But after each iteration of the RecursiveUnion operator, the following
operations are performed:

122 /* done with old working table ... */
123 tuplestore_end(node->working_table); -- (1)
124
125 /* intermediate table becomes working table */
126 node->working_table = node->intermediate_table; -- (2)
127
128 /* create new empty intermediate table */
129 node->intermediate_table = tuplestore_begin_heap(false, false,
130 work_mem); -- (3)

https://doxygen.postgresql.org/nodeRecursiveunion_8c_source.html#l00122

In step (1), the current working_table is released. Therefore, all read pointers
that we had additionally allocated are gone, too. The intermediate table becomes
the working table in step (2), and finally a new empty intermediate table is
created (3).

Because of step (1), we have to allocate new unique read pointers for each worktable
scan again. Just using tuplestore_select_read_pointer() would be incorrect.

What is the special role of read pointer 0 that you are copying. Before your
changes, it was just the implicit read pointer, but now that we have several,
it would be good to explain their relationship.

To allocate a new read pointer, tuplestore_alloc_read_pointer() could also be used.
However, we would have to know about the eflags parameter – which the worktable
scan has no information about.

The important thing about read pointer 0 is that it always exists, and it is initialized correctly.
Therefore, it is save to copy read pointer 0 instead of creating a new one from scratch.

Also, the comment you deleted says "Therefore, we don't need a private read pointer
for the tuplestore, nor do we need to tell tuplestore_gettupleslot to copy."
You addressed the first part with the read pointer handling, but what about the
second part? The tuplestore_gettupleslot() call in WorkTableScanNext() still
has copy=false. Is this an oversight, or did you determine that copying is
still not necessary?

That's an oversight. Copying is still not necessary. Copying would only be required,
if additional writes to the tuplestore occur. But that can not happen here.

Best,
-- Denis

#25

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Denis Hirn (#24)

Re: [PATCH] Allow multiple recursive self-references

On 11.01.22 12:33, Denis Hirn wrote:

I have been studying this a bit more. I don't understand your argument here.
Why would this query have different semantics than, say

WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
VALUES (2)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
) SELECT * FROM t LIMIT 100;

The order of UNION branches shouldn't be semantically relevant.

WITH RECURSIVE (ab)uses the (typically associative and commutative) UNION [ALL] clause,
and fundamentally changes the semantics – associativity and commutativity no longer apply.
I think your confusion stems from this ambiguity.

The language in the SQL standard does not support this statement. There
is nothing in there that says that certain branches of the UNION in a
recursive query mean certain things. In fact, it doesn't even require
the query to contain a UNION at all. It just says to iterate on
evaluating the query until a fixed point is reached. I think this
supports my claim that the associativity and commutativity of a UNION in
a recursive query still apply.

This is all very complicated, so I don't claim this to be authoritative,
but I just don't see anything in the spec that supports what you are saying.

#26

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 4 years ago

In reply to: Peter Eisentraut (#25)

Re: [PATCH] Allow multiple recursive self-references

On 14. Jan 2022, at 13:21, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

There is nothing in there that says that certain branches of the UNION in a recursive query mean certain things. In fact, it doesn't even require the query to contain a UNION at all. It just says to iterate on evaluating the query until a fixed point is reached. I think this supports my claim that the associativity and commutativity of a UNION in a recursive query still apply.

This is all very complicated, so I don't claim this to be authoritative, but I just don't see anything in the spec that supports what you are saying.

I disagree. In SQL:2016, it's discussed in 7.16 <query expression> syntax rule 3) j) i), which defines:

3) Among the WQEi, ... WQEk of a given stratum, there shall be at least one <query expres-
sion>, say WQEj, such that:
A) WQEj is a <query expression body> that immediately contains UNION.
B) WQEj has one operand that does not contain a <query name> referencing any of WQNi,
..., WQNk. This operand is said to be the non-recursive operand of WQEj.
C) WQEj is said to be an anchor expression, and WQNj an anchor name.

Where <query expression body> is defined as:

<query expression body> ::=
<query term>
| <query expression body> UNION [ALL | DISTINCT]
[ <corresponding spec> ] <query term>

<query term> ::=
<query primary>
| ...

<query primary> ::= ...
| <left paren> <query expression body> ... <right paren>

This definition pretty much sums up what I have called RUNION.

The SQL standard might not impose a strict order on the UNION branches.
But you have to be able to uniquely identify the anchor expression.

Best,
-- Denis

#27

Denis Hirn

denis.hirn@uni-tuebingen.de

almost 4 years ago

In reply to: Denis Hirn (#26)

Re: [PATCH] Allow multiple recursive self-references

On 14. Jan 2022, at 13:21, Peter Eisentraut

<peter.eisentraut@enterprisedb.com> wrote:

There is nothing in there that says that certain branches of the

UNION in a recursive query mean certain things. In fact, it doesn't even
require the query to contain a UNION at all. It just says to iterate on
evaluating the query until a fixed point is reached. I think this
supports my claim that the associativity and commutativity of a UNION in
a recursive query still apply.

This is all very complicated, so I don't claim this to be

authoritative, but I just don't see anything in the spec that supports
what you are saying.

Please also have a look at SQL:2016, 7.16 <query expression> General
Rules 2) c),
which defines the evaluation semantics of recursive queries. I think
that this part
of the SQL standard refutes your argument.

Best,
-- Denis

#28

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Denis Hirn (#26)

Re: [PATCH] Allow multiple recursive self-references

On 14.01.22 15:55, Denis Hirn wrote:

There is nothing in there that says that certain branches of the UNION in a recursive query mean certain things. In fact, it doesn't even require the query to contain a UNION at all. It just says to iterate on evaluating the query until a fixed point is reached. I think this supports my claim that the associativity and commutativity of a UNION in a recursive query still apply.

This is all very complicated, so I don't claim this to be authoritative, but I just don't see anything in the spec that supports what you are saying.

I disagree. In SQL:2016, it's discussed in 7.16 <query expression> syntax rule 3) j) i), which defines:

[actually 7.17]

3) Among the WQEi, ... WQEk of a given stratum, there shall be at least one <query expres-
sion>, say WQEj, such that:
A) WQEj is a <query expression body> that immediately contains UNION.
B) WQEj has one operand that does not contain a <query name> referencing any of WQNi,
..., WQNk. This operand is said to be the non-recursive operand of WQEj.
C) WQEj is said to be an anchor expression, and WQNj an anchor name.

Where <query expression body> is defined as:

<query expression body> ::=
<query term>
| <query expression body> UNION [ALL | DISTINCT]
[ <corresponding spec> ] <query term>

<query term> ::=
<query primary>
| ...

<query primary> ::= ...
| <left paren> <query expression body> ... <right paren>

This definition pretty much sums up what I have called RUNION.

The SQL standard might not impose a strict order on the UNION branches.
But you have to be able to uniquely identify the anchor expression.

Right, the above text does not impose any ordering of the UNION. It
just means that it has to have an operand that it not recursive. I
think that is what I was trying to say.

I don't understand what your RUNION examples are meant to show.

#29

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Denis Hirn (#24)

Re: [PATCH] Allow multiple recursive self-references

The explanations below are satisfactory to me. I think the executor
changes in this patch are ok. But I would like someone else who has
more experience in this particular area to check it too; I'm not going
to take committer responsibility for this by myself without additional
review.

As I said earlier, I think semantically/mathematically, the changes
proposed by this patch set are okay.

The rewriting in the parse analysis is still being debated, but it can
be tackled in separate patches/commits.

Show quoted text

On 11.01.22 12:33, Denis Hirn wrote:

I wonder why in ExecWorkTableScan() and ExecReScanWorkTableScan() you call
tuplestore_copy_read_pointer() instead of just tuplestore_select_read_pointer().

The WorkTableScan reads the working_table tuplestore of the parent RecursiveUnion
operator. But after each iteration of the RecursiveUnion operator, the following
operations are performed:

122 /* done with old working table ... */
123 tuplestore_end(node->working_table); -- (1)
124
125 /* intermediate table becomes working table */
126 node->working_table = node->intermediate_table; -- (2)
127
128 /* create new empty intermediate table */
129 node->intermediate_table = tuplestore_begin_heap(false, false,
130 work_mem); -- (3)

https://doxygen.postgresql.org/nodeRecursiveunion_8c_source.html#l00122

In step (1), the current working_table is released. Therefore, all read pointers
that we had additionally allocated are gone, too. The intermediate table becomes
the working table in step (2), and finally a new empty intermediate table is
created (3).

Because of step (1), we have to allocate new unique read pointers for each worktable
scan again. Just using tuplestore_select_read_pointer() would be incorrect.

What is the special role of read pointer 0 that you are copying. Before your
changes, it was just the implicit read pointer, but now that we have several,
it would be good to explain their relationship.

To allocate a new read pointer, tuplestore_alloc_read_pointer() could also be used.
However, we would have to know about the eflags parameter – which the worktable
scan has no information about.

The important thing about read pointer 0 is that it always exists, and it is initialized correctly.
Therefore, it is save to copy read pointer 0 instead of creating a new one from scratch.

Also, the comment you deleted says "Therefore, we don't need a private read pointer
for the tuplestore, nor do we need to tell tuplestore_gettupleslot to copy."
You addressed the first part with the read pointer handling, but what about the
second part? The tuplestore_gettupleslot() call in WorkTableScanNext() still
has copy=false. Is this an oversight, or did you determine that copying is
still not necessary?

That's an oversight. Copying is still not necessary. Copying would only be required,
if additional writes to the tuplestore occur. But that can not happen here.

#30

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Peter Eisentraut (#29)

Re: [PATCH] Allow multiple recursive self-references

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

As I said earlier, I think semantically/mathematically, the changes
proposed by this patch set are okay.

I took a quick look at this patch because I wondered how it would
affect the SEARCH/CYCLE bug discussed at [1]/messages/by-id/17320-70e37868182512ab@postgresql.org. Doesn't it break
rewriteSearchAndCycle() completely? That code assumes (without a
lot of checking) that a recursive query is a UNION [ALL] of exactly
two SELECTs.

Some other random thoughts while I'm looking at it (not a full review):

* The patch removes this comment in WorkTableScanNext:

- * Note: we are also assuming that this node is the only reader of the
- * worktable. Therefore, we don't need a private read pointer for the
- * tuplestore, nor do we need to tell tuplestore_gettupleslot to copy.

I see that it deals with the private-read-pointer question, but I do not
see any changes addressing the point about copying tuples fetched from the
tuplestore. Perhaps there's no issue, but if not, a comment explaining
why not would be appropriate.

* The long comment added to checkWellFormedRecursion will be completely
destroyed by pgindent, but that's the least of its problems: it does
not explain either why we need a tree rotation or why that doesn't
break SQL semantics. The shorter comment in front of it needs
rewritten, too. And I'm not really convinced that the new loop
is certain to terminate.

* The chunk added to checkWellFormedSelectStmt is undercommented,
because of which I'm not convinced that it's right at all. Since
that's really the meat of this patch, it needs more attention.
And the new variable names are still impossibly confusing.

regards, tom lane

[1]: /messages/by-id/17320-70e37868182512ab@postgresql.org

#31

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Tom Lane (#30)

Re: [PATCH] Allow multiple recursive self-references

This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

https://commitfest.postgresql.org/38/3046/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob