A performance issue with Memoize

Started by Richard Guoabout 2 years ago25 messages
#1Richard Guo
guofenglinux@gmail.com
1 attachment(s)

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
Planning Time: 0.666 ms
Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
-> Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
Planning Time: 0.607 ms
Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
--------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage:
2735kB
-> Subquery Scan on s (actual time=0.009..5.169 rows=5000 loops=2)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.006..4.050
rows=10000 loops=2)
Planning Time: 0.593 ms
Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

Thanks
Richard

Attachments:

v1-0001-Fix-a-performance-issue-with-Memoize.patchapplication/octet-stream; name=v1-0001-Fix-a-performance-issue-with-Memoize.patchDownload
From 8d6389fd44c8df17e4812d575295ce2370f43c67 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 20 Oct 2023 17:53:02 +0800
Subject: [PATCH v1] Fix a performance issue with Memoize

---
 src/backend/optimizer/plan/createplan.c | 13 +++++++---
 src/test/regress/expected/memoize.out   | 32 +++++++++++++++++++++++++
 src/test/regress/sql/memoize.sql        | 13 ++++++++++
 3 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..4cd0759e55 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3720,13 +3720,20 @@ create_subqueryscan_plan(PlannerInfo *root, SubqueryScanPath *best_path,
 	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
 	scan_clauses = extract_actual_clauses(scan_clauses, false);
 
-	/* Replace any outer-relation variables with nestloop params */
+	/*
+	 * Add the subquery's subplan_params to root->curOuterParams and then
+	 * replace any outer-relation variables with nestloop params
+	 *
+	 * Note that we do them in this order so that we can share the same
+	 * PARAM_EXEC slot for the same Var that both belongs to the subquery's
+	 * uplevel vars and to the NestLoop's outer-relation vars.
+	 */
 	if (best_path->path.param_info)
 	{
-		scan_clauses = (List *)
-			replace_nestloop_params(root, (Node *) scan_clauses);
 		process_subquery_nestloop_params(root,
 										 rel->subplan_params);
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
 	}
 
 	scan_plan = make_subqueryscan(tlist,
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index f5202430f8..b6b976698c 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -92,6 +92,38 @@ WHERE t1.unique1 < 1000;
   1000 | 9.5000000000000000
 (1 row)
 
+-- Try with LATERAL joins
+SELECT explain_memoize('
+SELECT COUNT(*),AVG(t2.t1two) FROM tenk1 t1 LEFT JOIN
+LATERAL (SELECT t1.two as t1two, * FROM tenk1 t2 OFFSET 0) t2
+ON t1.two = t2.two
+WHERE t1.unique1 < 10;', false);
+                                  explain_memoize                                   
+------------------------------------------------------------------------------------
+ Aggregate (actual rows=1 loops=N)
+   ->  Nested Loop Left Join (actual rows=50000 loops=N)
+         ->  Index Scan using tenk1_unique1 on tenk1 t1 (actual rows=10 loops=N)
+               Index Cond: (unique1 < 10)
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
+               Cache Mode: binary
+               Hits: 8  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
+               ->  Subquery Scan on t2 (actual rows=5000 loops=N)
+                     Filter: (t1.two = t2.two)
+                     Rows Removed by Filter: 5000
+                     ->  Seq Scan on tenk1 t2_1 (actual rows=10000 loops=N)
+(12 rows)
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.t1two) FROM tenk1 t1 LEFT JOIN
+LATERAL (SELECT t1.two as t1two, * FROM tenk1 t2 OFFSET 0) t2
+ON t1.two = t2.two
+WHERE t1.unique1 < 10;
+ count |          avg           
+-------+------------------------
+ 50000 | 0.50000000000000000000
+(1 row)
+
 -- Reduce work_mem and hash_mem_multiplier so that we see some cache evictions
 SET work_mem TO '64kB';
 SET hash_mem_multiplier TO 1.0;
diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql
index 29ab1ea62d..b387c162e6 100644
--- a/src/test/regress/sql/memoize.sql
+++ b/src/test/regress/sql/memoize.sql
@@ -57,6 +57,19 @@ LATERAL (SELECT t2.unique1 FROM tenk1 t2
          WHERE t1.twenty = t2.unique1 OFFSET 0) t2
 WHERE t1.unique1 < 1000;
 
+-- Try with LATERAL joins
+SELECT explain_memoize('
+SELECT COUNT(*),AVG(t2.t1two) FROM tenk1 t1 LEFT JOIN
+LATERAL (SELECT t1.two as t1two, * FROM tenk1 t2 OFFSET 0) t2
+ON t1.two = t2.two
+WHERE t1.unique1 < 10;', false);
+
+-- And check we get the expected results.
+SELECT COUNT(*),AVG(t2.t1two) FROM tenk1 t1 LEFT JOIN
+LATERAL (SELECT t1.two as t1two, * FROM tenk1 t2 OFFSET 0) t2
+ON t1.two = t2.two
+WHERE t1.unique1 < 10;
+
 -- Reduce work_mem and hash_mem_multiplier so that we see some cache evictions
 SET work_mem TO '64kB';
 SET hash_mem_multiplier TO 1.0;
-- 
2.31.0

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Richard Guo (#1)
Re: A performance issue with Memoize

pá 20. 10. 2023 v 13:01 odesílatel Richard Guo <guofenglinux@gmail.com>
napsal:

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
Planning Time: 0.666 ms
Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000
loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
-> Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
Planning Time: 0.607 ms
Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

--------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage:
2735kB
-> Subquery Scan on s (actual time=0.009..5.169 rows=5000
loops=2)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.006..4.050
rows=10000 loops=2)
Planning Time: 0.593 ms
Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

+1

it would be great to fix this problem - I've seen this issue a few times.

Regards

Pavel

Show quoted text

Thanks
Richard

#3Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#1)
Re: A performance issue with Memoize

On Fri, Oct 20, 2023 at 6:40 PM Richard Guo <guofenglinux@gmail.com> wrote:

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

After some more thought, I think this is the right way to fix this
issue. The idea here is to make sure that the same NLP Var shares the
same PARAM_EXEC slot. This change can also help to save PARAM_EXEC
slots (which is trivial though since slots are very cheap).

Thanks
Richard

#4Richard Guo
guofenglinux@gmail.com
In reply to: Pavel Stehule (#2)
Re: A performance issue with Memoize

On Fri, Oct 20, 2023 at 7:43 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

+1

it would be great to fix this problem - I've seen this issue a few times.

Thanks for the input. I guess this is not rare in the real world. If
the subquery contains lateral reference to a Var that also appears in
the subquery's join clauses, we'd probably suffer from this issue.

Thanks
Richard

#5Andrei Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Richard Guo (#1)
Re: A performance issue with Memoize

On 20/10/2023 17:40, Richard Guo wrote:

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
   ->  Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
         Filter: (t1.two = s.two)
         Rows Removed by Filter: 5000
         ->  Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
 Planning Time: 0.666 ms
 Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000
loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
   ->  Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
         Cache Key: t1.two, t1.two
         Cache Mode: binary
         Hits: 0  Misses: 10000  Evictions: 9999  Overflows: 0  Memory
Usage: 1368kB
         ->  Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
               Filter: (t1.two = s.two)
               Rows Removed by Filter: 5000
               ->  Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
 Planning Time: 0.607 ms
 Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different?  Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys.  That is to say, the memoize node's
keyparamids is {1}.
...
Any thoughts?

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

--
regards,
Andrei Lepikhov
Postgres Professional

#6Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#5)
Re: A performance issue with Memoize

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so. The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.

Actually, even without this change, we'd still share PARAM_EXEC slots
between subquery's subplan_params and outer-relation variables in some
cases. As an example, consider

explain (costs off)
select * from t t1 left join
(t t2 left join
lateral (select t1.a as t1a, t2.a as t2a, * from t t3) s
on t2.b = s.b)
on t1.b = s.b and t1.a = t2.a;
QUERY PLAN
-------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t1
-> Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on t t2
-> Subquery Scan on s
Filter: ((t1.b = s.b) AND (t2.b = s.b))
-> Seq Scan on t t3
(8 rows)

For outer-relation Var 't1.a' from qual 't1.a = t2.a', it shares
PARAM_EXEC slot 0 with the PlannerParamItem for 't1.a' within the
subquery (from its targetlist).

Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

Thanks
Richard

#7Andrei Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Richard Guo (#6)
Re: A performance issue with Memoize

On 30/10/2023 14:55, Richard Guo wrote:

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so.  The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop.  AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.
...
Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

I think your patch works correctly so far. I mentioned the commit
1db5667 because, as I see, the origin of the problem was parallel
workers. I have thought about pushing Memoize down to a parallel worker
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

--
regards,
Andrei Lepikhov
Postgres Professional

#8Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#7)
Re: A performance issue with Memoize

On Tue, Oct 31, 2023 at 1:36 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

On 30/10/2023 14:55, Richard Guo wrote:

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so. The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.
...
Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

I think your patch works correctly so far. I mentioned the commit
1db5667 because, as I see, the origin of the problem was parallel
workers. I have thought about pushing Memoize down to a parallel worker
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

Thanks for mentioning commit 1db5667, which brings my attention to more
aspects about the PARAM_EXEC mechanism. I don't think the discussion is
in vain. It helps a lot.

Thanks for looking into this patch.

Thanks
Richard

#9Arne Roland
arne.roland@malkut.net
In reply to: Richard Guo (#8)
Re: A performance issue with Memoize

Hi Richard,

I can tell this a real world problem. I have seen this multiple times in
production.

The fix seems surprisingly simple.

I hope my questions here aren't completely off. I still struggle to
think about the implications.

I wonder, if there is any stuff we are breaking by bluntly forgetting
about the subplan params. Maybe some table valued function scan within a
subquery scan? Or something about casts on a join condition, that could
be performed differently?

I wasn't able to construct a problem case. I might be just missing
context here. But I am not yet fully convinced whether this is safe to
do in all cases.

Regards
Arne

#10David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#1)
Re: A performance issue with Memoize

On Fri, 20 Oct 2023 at 23:40, Richard Guo <guofenglinux@gmail.com> wrote:

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I see the function calls were put this way around in 5ebaaa494
(Implement SQL-standard LATERAL subqueries.), per:

@ -1640,6 +1641,7 @@ create_subqueryscan_plan(PlannerInfo *root, Path
*best_path,
    {
        scan_clauses = (List *)
            replace_nestloop_params(root, (Node *) scan_clauses);
+       identify_nestloop_extparams(root, best_path->parent->subplan);
    }

(identify_nestloop_extparams was later renamed to
process_subquery_nestloop_params in 46c508fbc.)

I think fixing it your way makes sense. I don't really see any reason
why we should have two. However...

Another way it *could* be fixed would be to get rid of pull_paramids()
and change create_memoize_plan() to set keyparamids to all the param
IDs that match are equal() to each param_exprs. That way nodeMemoize
wouldn't purge the cache as we'd know the changing param is accounted
for in the cache. For the record, I don't think that's better, but it
scares me a bit less as I don't know what other repercussions there
are of applying your patch to get rid of the duplicate
NestLoopParam.paramval.

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I also feel like we might be getting a bit close to the minor version
releases to be adjusting this stuff in back branches.

David

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#10)
Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

Apologies for not having noticed this thread before. I'm taking
a look at it now. However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels. Is that a bug? It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
                 pvar->vartype == var->vartype &&
                 pvar->vartypmod == var->vartypmod &&
                 pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels, var->varnullingrels));
                 return pitem->paramId;
+            }
         }
     }

This triggers no failures in the regression tests, but we know
how little that proves.

Anyway, that's just a side observation unrelated to the problem
at hand. More later.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#10)
Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I think fixing it your way makes sense. I don't really see any reason
why we should have two. However...

Another way it *could* be fixed would be to get rid of pull_paramids()
and change create_memoize_plan() to set keyparamids to all the param
IDs that match are equal() to each param_exprs. That way nodeMemoize
wouldn't purge the cache as we'd know the changing param is accounted
for in the cache. For the record, I don't think that's better, but it
scares me a bit less as I don't know what other repercussions there
are of applying your patch to get rid of the duplicate
NestLoopParam.paramval.

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.
I'd change the comment though, maybe to

/*
* Replace any outer-relation variables with nestloop params.
*
* We must provide nestloop params for both lateral references of
* the subquery and outer vars in the scan_clauses. It's better
* to assign the former first, because that code path requires
* specific param IDs, while replace_nestloop_params can adapt
* to the IDs assigned by process_subquery_nestloop_params.
* This avoids possibly duplicating nestloop params when the
* same Var is needed for both reasons.
*/

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

I also feel like we might be getting a bit close to the minor version
releases to be adjusting this stuff in back branches.

Yeah, I'm not sure I would change this in the back branches.

regards, tom lane

#13Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#12)
Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 2:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.
I'd change the comment though, maybe to

/*
* Replace any outer-relation variables with nestloop params.
*
* We must provide nestloop params for both lateral references of
* the subquery and outer vars in the scan_clauses. It's better
* to assign the former first, because that code path requires
* specific param IDs, while replace_nestloop_params can adapt
* to the IDs assigned by process_subquery_nestloop_params.
* This avoids possibly duplicating nestloop params when the
* same Var is needed for both reasons.
*/

+1. It's much better.

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

When we collect the cache keys in paraminfo_get_equal_hashops() we
search param_info's ppi_clauses as well as innerrel's lateral_vars for
outer expressions. We do not perform de-duplication on the collected
outer expressions there. In my proposed test case, the same Var
't1.two' appears both in the param_info->ppi_clauses and in the
innerrel->lateral_vars, so we see two identical cache keys in the plan.
I noticed this before and wondered if we should do de-duplication on the
cache keys, but somehow I did not chase this to the ground.

Thanks
Richard

#14David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#12)
Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 07:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.

Makes sense.

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

David

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#14)
Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

+1, I was wondering if the test could be cheaper. It wasn't horrid
as Richard had it, but core regression tests add up over time.

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

regards, tom lane

#16David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#15)
1 attachment(s)
Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

Here's a patch for that.

David

Attachments:

dont_produce_duplicate_memoize_keys.patchtext/plain; charset=US-ASCII; name=dont_produce_duplicate_memoize_keys.patchDownload
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index c0ba087b40..8ecf9fe8dc 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -492,8 +492,16 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
 				return false;
 			}
 
-			*operators = lappend_oid(*operators, hasheqoperator);
-			*param_exprs = lappend(*param_exprs, expr);
+			/*
+			 * 'expr' may already exist as a parameter from a previous ppi_clauses.  No
+			 * need to include it again, however we'd better ensure we do switch into
+			 * binary mode if required.  See below.
+			 */
+			if (!list_member(*param_exprs, expr))
+			{
+				*operators = lappend_oid(*operators, hasheqoperator);
+				*param_exprs = lappend(*param_exprs, expr);
+			}
 
 			/*
 			 * When the join operator is not hashable then it's possible that
@@ -536,8 +544,16 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,
 			return false;
 		}
 
-		*operators = lappend_oid(*operators, typentry->eq_opr);
-		*param_exprs = lappend(*param_exprs, expr);
+		/*
+		 * 'expr' may already exist as a parameter from the ppi_clauses.  No need to
+		 * include it again, however we'd better ensure we do switch into binary
+		 * mode.
+		 */
+		if (!list_member(*param_exprs, expr))
+		{
+			*operators = lappend_oid(*operators, typentry->eq_opr);
+			*param_exprs = lappend(*param_exprs, expr);
+		}
 
 		/*
 		 * We must go into binary mode as we don't have too much of an idea of
diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index ca198ec3b8..17bb3c8661 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -107,7 +107,7 @@ WHERE t1.unique1 < 10;', false);
          ->  Index Scan using tenk1_unique1 on tenk1 t1 (actual rows=10 loops=N)
                Index Cond: (unique1 < 10)
          ->  Memoize (actual rows=2 loops=N)
-               Cache Key: t1.two, t1.two
+               Cache Key: t1.two
                Cache Mode: binary
                Hits: 8  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
                ->  Subquery Scan on t2 (actual rows=2 loops=N)
#17Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#11)
Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 1:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Apologies for not having noticed this thread before. I'm taking
a look at it now. However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels. Is that a bug? It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
pvar->vartype == var->vartype &&
pvar->vartypmod == var->vartypmod &&
pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels,
var->varnullingrels));
return pitem->paramId;
+            }
}
}

Yeah, I think it should be safe to assert that the varnullingrels is
equal here. The Var is supposed to be an upper-level Var, and two same
such Vars should not have different varnullingrels at this point,
although the varnullingrels might be adjusted later in
identify_current_nestloop_params according to which form of identity 3
we end up applying.

Thanks
Richard

#18Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#16)
Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 12:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

Here's a patch for that.

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like

-           if (!list_member(*param_exprs, expr))
+           if (!list_member(*param_exprs, expr) ||
+               !list_member_oid(*operators, hasheqoperator))

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

So I think the patch makes sense. +1.

Thanks
Richard

#19David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#18)
Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 19:03, Richard Guo <guofenglinux@gmail.com> wrote:

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like

-           if (!list_member(*param_exprs, expr))
+           if (!list_member(*param_exprs, expr) ||
+               !list_member_oid(*operators, hasheqoperator))

hmm, if that were the case you wouldn't do it that way. You'd need to
forboth() and look for an item in both lists matching the search.

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

Yip.

So I think the patch makes sense. +1.

Thanks for reviewing. I've pushed the patch.

David

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#14)
Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

regards, tom lane

#21David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#20)
Re: A performance issue with Memoize

On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
rather than "2"?

Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable.

David

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#21)
Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
rather than "2"?
Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

Maybe ... I don't have a better theory.

I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable.

Worth a try.

regards, tom lane

#23Alexander Lakhin
exclusion@gmail.com
In reply to: Tom Lane (#22)
Re: A performance issue with Memoize

Hello,

27.01.2024 00:09, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
rather than "2"?
Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

Maybe ... I don't have a better theory.

FWIW, I've found where this behaviour is documented:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170

(I've remembered a case with test/sql/partition_prune from 2020, where
sprintf on Windows worked the other way.)

Best regards,
Alexander

#24Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Alexander Lakhin (#23)
Re: A performance issue with Memoize

Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
Recheck Cond: (hundred < 5)
Heap Blocks: exact=263
-> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
Index Cond: (hundred < 5)
-> Memoize (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
Cache Key: t1.hundred
Cache Mode: logical
Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage: 5kB
-> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
Index Cond: (hundred = t1.hundred)
Planning Time: 0.661 ms
Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Regards,
Anthonin

Show quoted text

On Sat, Jan 27, 2024 at 5:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:

Hello,

27.01.2024 00:09, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
rather than "2"?
Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

Maybe ... I don't have a better theory.

FWIW, I've found where this behaviour is documented:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170

(I've remembered a case with test/sql/partition_prune from 2020, where
sprintf on Windows worked the other way.)

Best regards,
Alexander

#25Richard Guo
guofenglinux@gmail.com
In reply to: Anthonin Bonnefoy (#24)
Re: A performance issue with Memoize

On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy <
anthonin.bonnefoy@datadoghq.com> wrote:

Hi,

I've seen a similar issue with the following query (tested on the current
head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
Recheck Cond: (hundred < 5)
Heap Blocks: exact=263
-> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
Index Cond: (hundred < 5)
-> Memoize (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
Cache Key: t1.hundred
Cache Mode: logical
Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage:
5kB
-> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
Index Cond: (hundred = t1.hundred)
Planning Time: 0.661 ms
Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Thanks for the report! This issue is caused by that we fail to check
PHVs for lateral references, and hence cannot know that t1.two should
also be included in the cache keys.

I reported exactly the same issue in [1]/messages/by-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun+Vtrowcmxb41CzA@mail.gmail.com, and verified that it can be
fixed by the patch in [2]/messages/by-id/CAMbWs49+Cjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg@mail.gmail.com (which seems to require a rebase).

[1]: /messages/by-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun+Vtrowcmxb41CzA@mail.gmail.com
/messages/by-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun+Vtrowcmxb41CzA@mail.gmail.com
[2]: /messages/by-id/CAMbWs49+Cjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg@mail.gmail.com
/messages/by-id/CAMbWs49+Cjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg@mail.gmail.com

Thanks
Richard