A performance issue with Memoize

Started by Richard Guoover 2 years ago25 messages

guofenglinux@gmail.com

over 2 years ago

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
Planning Time: 0.666 ms
Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
-> Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
Planning Time: 0.607 ms
Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
--------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage:
2735kB
-> Subquery Scan on s (actual time=0.009..5.169 rows=5000 loops=2)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.006..4.050
rows=10000 loops=2)
Planning Time: 0.593 ms
Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

Thanks
Richard

Pavel Stehule

pavel.stehule@gmail.com

over 2 years ago

In reply to: Richard Guo (#1)

Re: A performance issue with Memoize

pá 20. 10. 2023 v 13:01 odesílatel Richard Guo <guofenglinux@gmail.com>
napsal:

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
Planning Time: 0.666 ms
Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000
loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
-> Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
Planning Time: 0.607 ms
Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN

--------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage:
2735kB
-> Subquery Scan on s (actual time=0.009..5.169 rows=5000
loops=2)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.006..4.050
rows=10000 loops=2)
Planning Time: 0.593 ms
Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

it would be great to fix this problem - I've seen this issue a few times.

Regards

Pavel

Show quoted text

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

over 2 years ago

In reply to: Richard Guo (#1)

Re: A performance issue with Memoize

On Fri, Oct 20, 2023 at 6:40 PM Richard Guo <guofenglinux@gmail.com> wrote:

I haven't thought thoroughly about the fix yet. But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards. That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars. To be concrete, something like attached.

After some more thought, I think this is the right way to fix this
issue. The idea here is to make sure that the same NLP Var shares the
same PARAM_EXEC slot. This change can also help to save PARAM_EXEC
slots (which is trivial though since slots are very cheap).

Thanks
Richard

Richard Guo

guofenglinux@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#2)

Re: A performance issue with Memoize

On Fri, Oct 20, 2023 at 7:43 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

+1

it would be great to fix this problem - I've seen this issue a few times.

Thanks for the input. I guess this is not rare in the real world. If
the subquery contains lateral reference to a Var that also appears in
the subquery's join clauses, we'd probably suffer from this issue.

Thanks
Richard

Andrei Lepikhov

lepihov@gmail.com

over 2 years ago

In reply to: Richard Guo (#1)

Re: A performance issue with Memoize

On 20/10/2023 17:40, Richard Guo wrote:

I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000
loops=10000)
Planning Time: 0.666 ms
Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
(select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
QUERY PLAN
------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000
loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
-> Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
Cache Key: t1.two, t1.two
Cache Mode: binary
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Subquery Scan on s (actual time=0.008..5.188 rows=5000
loops=10000)
Filter: (t1.two = s.two)
Rows Removed by Filter: 5000
-> Seq Scan on tenk1 t2 (actual time=0.004..4.081
rows=10000 loops=10000)
Planning Time: 0.607 ms
Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.
...
Any thoughts?

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

--
regards,
Andrei Lepikhov
Postgres Professional

Richard Guo

guofenglinux@gmail.com

over 2 years ago

In reply to: Andrei Lepikhov (#5)

Re: A performance issue with Memoize

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so. The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.

Actually, even without this change, we'd still share PARAM_EXEC slots
between subquery's subplan_params and outer-relation variables in some
cases. As an example, consider

explain (costs off)
select * from t t1 left join
(t t2 left join
lateral (select t1.a as t1a, t2.a as t2a, * from t t3) s
on t2.b = s.b)
on t1.b = s.b and t1.a = t2.a;
QUERY PLAN
-------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t1
-> Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on t t2
-> Subquery Scan on s
Filter: ((t1.b = s.b) AND (t2.b = s.b))
-> Seq Scan on t t3
(8 rows)

For outer-relation Var 't1.a' from qual 't1.a = t2.a', it shares
PARAM_EXEC slot 0 with the PlannerParamItem for 't1.a' within the
subquery (from its targetlist).

Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

Thanks
Richard

Andrei Lepikhov

lepihov@gmail.com

over 2 years ago

In reply to: Richard Guo (#6)

Re: A performance issue with Memoize

On 30/10/2023 14:55, Richard Guo wrote:

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so. The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.
...
Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

I think your patch works correctly so far. I mentioned the commit
1db5667 because, as I see, the origin of the problem was parallel
workers. I have thought about pushing Memoize down to a parallel worker
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

--
regards,
Andrei Lepikhov
Postgres Professional

Richard Guo

guofenglinux@gmail.com

over 2 years ago

In reply to: Andrei Lepikhov (#7)

Re: A performance issue with Memoize

On Tue, Oct 31, 2023 at 1:36 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

On 30/10/2023 14:55, Richard Guo wrote:

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so. The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.
...
Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

I think your patch works correctly so far. I mentioned the commit
1db5667 because, as I see, the origin of the problem was parallel
workers. I have thought about pushing Memoize down to a parallel worker
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

Thanks for mentioning commit 1db5667, which brings my attention to more
aspects about the PARAM_EXEC mechanism. I don't think the discussion is
in vain. It helps a lot.

Thanks for looking into this patch.

Thanks
Richard

Arne Roland

arne.roland@malkut.net

about 2 years ago

In reply to: Richard Guo (#8)

Re: A performance issue with Memoize

Hi Richard,

I can tell this a real world problem. I have seen this multiple times in
production.

The fix seems surprisingly simple.

I hope my questions here aren't completely off. I still struggle to
think about the implications.

I wonder, if there is any stuff we are breaking by bluntly forgetting
about the subplan params. Maybe some table valued function scan within a
subquery scan? Or something about casts on a join condition, that could
be performed differently?

I wasn't able to construct a problem case. I might be just missing
context here. But I am not yet fully convinced whether this is safe to
do in all cases.

Regards
Arne

#10

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Richard Guo (#1)

Re: A performance issue with Memoize

On Fri, 20 Oct 2023 at 23:40, Richard Guo <guofenglinux@gmail.com> wrote:

The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different? Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys. That is to say, the memoize node's
keyparamids is {1}.

I see the function calls were put this way around in 5ebaaa494
(Implement SQL-standard LATERAL subqueries.), per:

@ -1640,6 +1641,7 @@ create_subqueryscan_plan(PlannerInfo *root, Path
*best_path,
    {
        scan_clauses = (List *)
            replace_nestloop_params(root, (Node *) scan_clauses);
+       identify_nestloop_extparams(root, best_path->parent->subplan);
    }

(identify_nestloop_extparams was later renamed to
process_subquery_nestloop_params in 46c508fbc.)

I think fixing it your way makes sense. I don't really see any reason
why we should have two. However...

Another way it *could* be fixed would be to get rid of pull_paramids()
and change create_memoize_plan() to set keyparamids to all the param
IDs that match are equal() to each param_exprs. That way nodeMemoize
wouldn't purge the cache as we'd know the changing param is accounted
for in the cache. For the record, I don't think that's better, but it
scares me a bit less as I don't know what other repercussions there
are of applying your patch to get rid of the duplicate
NestLoopParam.paramval.

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I also feel like we might be getting a bit close to the minor version
releases to be adjusting this stuff in back branches.

David

#11

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David Rowley (#10)

Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

Apologies for not having noticed this thread before. I'm taking
a look at it now. However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels. Is that a bug? It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
                 pvar->vartype == var->vartype &&
                 pvar->vartypmod == var->vartypmod &&
                 pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels, var->varnullingrels));
                 return pitem->paramId;
+            }
         }
     }

This triggers no failures in the regression tests, but we know
how little that proves.

Anyway, that's just a side observation unrelated to the problem
at hand. More later.

regards, tom lane

#12

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David Rowley (#10)

Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I think fixing it your way makes sense. I don't really see any reason
why we should have two. However...

Another way it *could* be fixed would be to get rid of pull_paramids()
and change create_memoize_plan() to set keyparamids to all the param
IDs that match are equal() to each param_exprs. That way nodeMemoize
wouldn't purge the cache as we'd know the changing param is accounted
for in the cache. For the record, I don't think that's better, but it
scares me a bit less as I don't know what other repercussions there
are of applying your patch to get rid of the duplicate
NestLoopParam.paramval.

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.
I'd change the comment though, maybe to

/*
* Replace any outer-relation variables with nestloop params.
*
* We must provide nestloop params for both lateral references of
* the subquery and outer vars in the scan_clauses. It's better
* to assign the former first, because that code path requires
* specific param IDs, while replace_nestloop_params can adapt
* to the IDs assigned by process_subquery_nestloop_params.
* This avoids possibly duplicating nestloop params when the
* same Var is needed for both reasons.
*/

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

I also feel like we might be getting a bit close to the minor version
releases to be adjusting this stuff in back branches.

Yeah, I'm not sure I would change this in the back branches.

regards, tom lane

#13

Richard Guo

guofenglinux@gmail.com

about 2 years ago

In reply to: Tom Lane (#12)

Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 2:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.
I'd change the comment though, maybe to

/*
* Replace any outer-relation variables with nestloop params.
*
* We must provide nestloop params for both lateral references of
* the subquery and outer vars in the scan_clauses. It's better
* to assign the former first, because that code path requires
* specific param IDs, while replace_nestloop_params can adapt
* to the IDs assigned by process_subquery_nestloop_params.
* This avoids possibly duplicating nestloop params when the
* same Var is needed for both reasons.
*/

+1. It's much better.

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

When we collect the cache keys in paraminfo_get_equal_hashops() we
search param_info's ppi_clauses as well as innerrel's lateral_vars for
outer expressions. We do not perform de-duplication on the collected
outer expressions there. In my proposed test case, the same Var
't1.two' appears both in the param_info->ppi_clauses and in the
innerrel->lateral_vars, so we see two identical cache keys in the plan.
I noticed this before and wondered if we should do de-duplication on the
cache keys, but somehow I did not chase this to the ground.

Thanks
Richard

#14

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Tom Lane (#12)

Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 07:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c. However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not. So flipping the order makes sense.

Makes sense.

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

However ... it seems like we're not out of the woods yet. Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

David

#15

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David Rowley (#14)

Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

+1, I was wondering if the test could be cheaper. It wasn't horrid
as Richard had it, but core regression tests add up over time.

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

regards, tom lane

#16

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Tom Lane (#15)

Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.
This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

Here's a patch for that.

David

#17

Richard Guo

guofenglinux@gmail.com

about 2 years ago

In reply to: Tom Lane (#11)

Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 1:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Apologies for not having noticed this thread before. I'm taking
a look at it now. However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels. Is that a bug? It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding
@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
pvar->vartype == var->vartype &&
pvar->vartypmod == var->vartypmod &&
pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels,
var->varnullingrels));
return pitem->paramId;
+            }
}
}

Yeah, I think it should be safe to assert that the varnullingrels is
equal here. The Var is supposed to be an upper-level Var, and two same
such Vars should not have different varnullingrels at this point,
although the varnullingrels might be adjusted later in
identify_current_nestloop_params according to which form of identity 3
we end up applying.

Thanks
Richard

#18

Richard Guo

guofenglinux@gmail.com

about 2 years ago

In reply to: David Rowley (#16)

Re: A performance issue with Memoize

On Fri, Jan 26, 2024 at 12:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing
+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two
Seems like there is missing de-duplication logic, or something.
This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.
Here's a patch for that.

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like

-           if (!list_member(*param_exprs, expr))
+           if (!list_member(*param_exprs, expr) ||
+               !list_member_oid(*operators, hasheqoperator))

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

So I think the patch makes sense. +1.

Thanks
Richard

#19

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Richard Guo (#18)

Re: A performance issue with Memoize

On Fri, 26 Jan 2024 at 19:03, Richard Guo <guofenglinux@gmail.com> wrote:

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like
-           if (!list_member(*param_exprs, expr))
+           if (!list_member(*param_exprs, expr) ||
+               !list_member_oid(*operators, hasheqoperator))

hmm, if that were the case you wouldn't do it that way. You'd need to
forboth() and look for an item in both lists matching the search.

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

Yip.

So I think the patch makes sense. +1.

Thanks for reviewing. I've pushed the patch.

David

#20

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David Rowley (#14)

Re: A performance issue with Memoize

David Rowley <dgrowleyml@gmail.com> writes:

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

drongo and fairywren are consistently failing the test case added
by this commit. I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

regards, tom lane

#21

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Tom Lane (#20)

#22

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: David Rowley (#21)

#23