Huge memory consumption on partitioned table with FKs

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Tatsuro Yamada (#1)

Re: Huge memory consumption on partitioned table with FKs

Hi Hackers,

Analyzed the problem and created a patch to resolve it.

# Problem 1

When you create a foreign key to a partitioned table, referential
integrity function is created for the number of partitions.
Internally, SPI_prepare() creates a plan and SPI_keepplan()
caches the plan.

The more partitions in the referencing table, the more plans will
be cached.

# Problem 2

When referenced table is partitioned table, the larger the number
of partitions, the larger the plan size to be cached.

The actual plan processed is simple and small if pruning is
enabled. However, the cached plan will include all partition
information.

The more partitions in the referenced table, the larger the plan
size to be cached.

# Idea for solution

Constraints with the same pg_constraint.parentid can be combined
into one plan with the same comparentid if we can guarantee that
all their contents are the same.

Problem 1 can be solved
and significant memory bloat can be avoided.
CachedPlan: 710MB -> 1466kB

Solving Problem 2 could also reduce memory,
but I don't have a good idea.

Currently, DISCARD ALL does not discard CachedPlan by SPI as in
this case. It may be possible to modify DISCARD ALL to discard
CachedPlan and run it periodically. However, we think the burden
on the user is high.

# result with patch(PG14 HEAD(e522024b) + patch)

name | bytes | pg_size_pretty
------------------+---------+----------------
CachedPlanQuery | 12912 | 13 kB
CachedPlanSource | 17448 | 17 kB
CachedPlan | 1501192 | 1466 kB

CachedPlan * 1 Record

postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name
= 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
count
-------
1

postgres=# SELECT * FROM pg_backend_memory_contexts WHERE name =
'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
-[ RECORD 1 ]-+--------------------------------------------------------------------------------------
name | CachedPlan
ident | SELECT 1 FROM "public"."ps" x WHERE "c1"
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
parent | CacheMemoryContext
level | 2
total_bytes | 2101248
total_nblocks | 12
free_bytes | 613256
free_chunks | 1
used_bytes | 1487992
(1 record)

# result without patch(PG14 HEAD(e522024b))

name | bytes | pg_size_pretty
------------------+-----------+----------------
CachedPlanQuery | 1326280 | 1295 kB
CachedPlanSource | 1474528 | 1440 kB
CachedPlan | 744009200 | 710 MB

CachedPlan * 500 Records

postgres=# SELECT count(*) FROM pg_backend_memory_contexts WHERE name
= 'CachedPlan' AND ident LIKE 'SELECT 1 FROM%';
count
-------
500

Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Keisuke Kuroda (#2)

Re: Huge memory consumption on partitioned table with FKs

At Thu, 26 Nov 2020 09:59:28 +0900, Keisuke Kuroda <keisuke.kuroda.3862@gmail.com> wrote in

Hi Hackers,

Analyzed the problem and created a patch to resolve it.

# Problem 1

When you create a foreign key to a partitioned table, referential
integrity function is created for the number of partitions.
Internally, SPI_prepare() creates a plan and SPI_keepplan()
caches the plan.

The more partitions in the referencing table, the more plans will
be cached.

# Problem 2

When referenced table is partitioned table, the larger the number
of partitions, the larger the plan size to be cached.

The actual plan processed is simple and small if pruning is
enabled. However, the cached plan will include all partition
information.

The more partitions in the referenced table, the larger the plan
size to be cached.

# Idea for solution

Constraints with the same pg_constraint.parentid can be combined
into one plan with the same comparentid if we can guarantee that
all their contents are the same.

The memory reduction this patch gives seems quite high with a small
footprint.

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

create table t (a int primary key);
create table p (a int, r int references t(a)) partition by range(a);
create table c1 partition of p for values from (0) to (10);
create table c2 (a int, r int);
alter table c2 drop column r;
alter table c2 add column r int;
alter table p attach partition c2 for values from (10) to (20);

In that case riinfo->fk_attnums has different values from other
partitions.

=# select oid, conrelid::regclass, confrelid::regclass, conparentid, conname, conkey from pg_constraint where confrelid = 't'::regclass;

oid | conrelid | confrelid | conparentid | conname | conkey
-------+----------+-----------+-------------+----------+--------
16620 | p | t | 0 | p_r_fkey | {2}
16626 | c1 | t | 16620 | p_r_fkey | {2}
16632 | c2 | t | 16620 | p_r_fkey | {3}
(3 rows)

conkey is copied onto riinfo->fk_attnums.

Problem 1 can be solved
and significant memory bloat can be avoided.
CachedPlan: 710MB -> 1466kB

Solving Problem 2 could also reduce memory,
but I don't have a good idea.

Currently, DISCARD ALL does not discard CachedPlan by SPI as in
this case. It may be possible to modify DISCARD ALL to discard
CachedPlan and run it periodically. However, we think the burden
on the user is high.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#3)

Re: Huge memory consumption on partitioned table with FKs

On 2020-Nov-26, Kyotaro Horiguchi wrote:

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

I would embed all this knowledge in ri_BuildQueryKey though, without
adding the new function ri_GetParentConstOid. I don't think that
function meaningful abstraction value, and instead it would make what I
suggest more difficult.

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#4)

Re: Huge memory consumption on partitioned table with FKs

At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Nov-26, Kyotaro Horiguchi wrote:

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

*I* think it's the direction. After an off-list discussion, we
confirmed that even in that case the patch works as is because
fk_attnum (or contuple.conkey) always stores key attnums compatible
to the topmost parent when conparent has a valid value (assuming the
current usage of fk_attnum), but I still feel uneasy to rely on that
unclear behavior.

I would embed all this knowledge in ri_BuildQueryKey though, without
adding the new function ri_GetParentConstOid. I don't think that
function meaningful abstraction value, and instead it would make what I
suggest more difficult.

It seems to me reasonable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Corey Huinker

corey.huinker@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#4)

Re: Huge memory consumption on partitioned table with FKs

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

Somewhat of a detour, but in reviewing the patch for Statement-Level RI
checks, Andres and I observed that SPI made for a large portion of the RI
overhead.

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Corey Huinker (#6)

Re: Huge memory consumption on partitioned table with FKs

Corey Huinker <corey.huinker@gmail.com> writes:

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.

Yeah, maybe. Certainly ri_triggers is putting a huge amount of effort
into working around the SPI/parser/planner layer, to not a lot of gain.

However, it's not clear to me that that line of thought will work well
for the statement-level-trigger approach. In that case you might be
dealing with enough tuples to make a different plan advisable.

regards, tom lane

Corey Huinker

corey.huinker@gmail.com

over 5 years ago

In reply to: Tom Lane (#7)

Re: Huge memory consumption on partitioned table with FKs

On Mon, Nov 30, 2020 at 9:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.

Yeah, maybe. Certainly ri_triggers is putting a huge amount of effort
into working around the SPI/parser/planner layer, to not a lot of gain.

However, it's not clear to me that that line of thought will work well
for the statement-level-trigger approach. In that case you might be
dealing with enough tuples to make a different plan advisable.

regards, tom lane

Bypassing SPI would probably mean that we stay with row level triggers, and
the cached query plan would go away, perhaps replaced by an
already-looked-up-this-tuple hash sorta like what the cached nested loops
effort is doing.

I've been meaning to give this a try when I got some spare time. This may
inspire me to try again.

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#5)

Re: Huge memory consumption on partitioned table with FKs

Hello,

On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Nov-26, Kyotaro Horiguchi wrote:

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

*I* think it's the direction. After an off-list discussion, we
confirmed that even in that case the patch works as is because
fk_attnum (or contuple.conkey) always stores key attnums compatible
to the topmost parent when conparent has a valid value (assuming the
current usage of fk_attnum), but I still feel uneasy to rely on that
unclear behavior.

Hmm, I don't see what the problem is here, because it's not
RI_ConstraintInfo that is being shared among partitions of the same
parent, but the RI query (and the SPIPlanPtr for it) that is issued
through SPI. The query (and the plan) turns out to be the same no
matter what partition's RI_ConstraintInfo is first used to generate
it. What am I missing?

BTW, one problem with Kuroda-san's patch is that it's using
conparentid as the shared key, which can still lead to multiple
queries being generated when using multi-level partitioning, because
there would be many (intermediate) parent constraints in that case.
We should really be using the "root" constraint OID as the key,
because there would only be one root from which all constraints in a
given partition hierarchy would've originated. Actually, I had
written a patch a few months back to do exactly that, a polished
version of which I've attached with this email. Please take a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

#10

Simon Riggs

simon@2ndQuadrant.com

over 5 years ago

In reply to: Alvaro Herrera (#4)

Re: Huge memory consumption on partitioned table with FKs

On Tue, 1 Dec 2020 at 00:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

cache entry is reused in the common case where they are identical.

Does a similar situation exist for partition statistics accessed
during planning? Or planning itself?

It would be useful to avoid repeated access to similar statistics and
repeated planning of sub-plans for similar partitions.

--
Simon Riggs http://www.EnterpriseDB.com/

#11

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Langote (#9)

Re: Huge memory consumption on partitioned table with FKs

At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Hello,

On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Nov-26, Kyotaro Horiguchi wrote:

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

*I* think it's the direction. After an off-list discussion, we
confirmed that even in that case the patch works as is because
fk_attnum (or contuple.conkey) always stores key attnums compatible
to the topmost parent when conparent has a valid value (assuming the
current usage of fk_attnum), but I still feel uneasy to rely on that
unclear behavior.

Hmm, I don't see what the problem is here, because it's not
RI_ConstraintInfo that is being shared among partitions of the same
parent, but the RI query (and the SPIPlanPtr for it) that is issued
through SPI. The query (and the plan) turns out to be the same no
matter what partition's RI_ConstraintInfo is first used to generate
it. What am I missing?

I don't think you're missing something. As I (tried to..) mentioned in
another branch of this thread, you're right. On the otherhand
fk_attnums is actually used to generate the query, thus *issue*
potentially affects the query. Although that might be paranoic, that
possibility is eliminated by checking the congruency(?) of fk_attnums
(or other members). We might even be able to share a riinfo among
such children.

BTW, one problem with Kuroda-san's patch is that it's using
conparentid as the shared key, which can still lead to multiple
queries being generated when using multi-level partitioning, because
there would be many (intermediate) parent constraints in that case.
We should really be using the "root" constraint OID as the key,
because there would only be one root from which all constraints in a
given partition hierarchy would've originated. Actually, I had
written a patch a few months back to do exactly that, a polished
version of which I've attached with this email. Please take a look.

I don't like that the function self-recurses but
ri_GetParentConstrOid() actually climbs up to the root constraint, so
the patch covers the multilevel partitioning cases.

About your patch, it calculates the root constrid at the time an
riinfo is created, but when the root-partition is further attached to
another partitioned-table after the riinfo creation,
constraint_root_id gets stale. Of course that dones't matter
practically, though.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#11)

Re: Huge memory consumption on partitioned table with FKs

Hi Hackers,

I would embed all this knowledge in ri_BuildQueryKey though, without
adding the new function ri_GetParentConstOid. I don't think that
function meaningful abstraction value, and instead it would make what I
suggest more difficult.

It seems to me reasonable.

Indeed, I tried to get the conparentid with ri_BuildQueryKey.
(v2_reduce_ri_SPI-plan-hash.patch)

Somewhat of a detour, but in reviewing the patch for
Statement-Level RI checks, Andres and I observed that SPI
made for a large portion of the RI overhead.

If this can be resolved by reviewing the Statement-Level RI checks,
then I think it would be better.

BTW, one problem with Kuroda-san's patch is that it's using
conparentid as the shared key, which can still lead to multiple
queries being generated when using multi-level partitioning, because
there would be many (intermediate) parent constraints in that case.

Horiguchi-san also mentiond,
In my patch, in ri_GetParentConstOid, iterate on getting
the constraint OID until comparentid == InvalidOid.
This should allow to get the "root constraint OID".

However, the names "comparentid" and "ri_GetParentConstOid"
are confusing. We should use names like "constraint_root_id",
as in Amit-san's patch.

Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com

#13

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#11)

Re: Huge memory consumption on partitioned table with FKs

Hello,

On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Hello,

On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Nov-26, Kyotaro Horiguchi wrote:

This shares RI_ConstraintInfo cache by constraints that shares the
same parent constraints. But you forgot that the cache contains some
members that can differ among partitions.

Consider the case of attaching a partition that have experienced a
column deletion.

I think this can be solved easily in the patch, by having
ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
they are equal then use the parent's constaint_id, otherwise use the
child constraint. That way, the cache entry is reused in the common
case where they are identical.

*I* think it's the direction. After an off-list discussion, we
confirmed that even in that case the patch works as is because
fk_attnum (or contuple.conkey) always stores key attnums compatible
to the topmost parent when conparent has a valid value (assuming the
current usage of fk_attnum), but I still feel uneasy to rely on that
unclear behavior.

Hmm, I don't see what the problem is here, because it's not
RI_ConstraintInfo that is being shared among partitions of the same
parent, but the RI query (and the SPIPlanPtr for it) that is issued
through SPI. The query (and the plan) turns out to be the same no
matter what partition's RI_ConstraintInfo is first used to generate
it. What am I missing?

I don't think you're missing something. As I (tried to..) mentioned in
another branch of this thread, you're right. On the otherhand
fk_attnums is actually used to generate the query, thus *issue*
potentially affects the query. Although that might be paranoic, that
possibility is eliminated by checking the congruency(?) of fk_attnums
(or other members). We might even be able to share a riinfo among
such children.

Just to be sure I've not misunderstood you, let me mention why I think
the queries used by different partitions all turn out to be the same
despite attribute number differences among partitions. The places
that uses fk_attnums when generating a query are these among others:

Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
...
Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
...
quoteOneName(attname,
RIAttName(fk_rel, riinfo->fk_attnums[i]));

For the queries on the referencing side ("check" side),
type/collation/attribute name determined using the above are going to
be the same for all partitions in a given tree irrespective of the
attribute number, because they're logically the same column. On the
referenced side ("restrict", "cascade", "set" side), as you already
mentioned, fk_attnums refers to the top parent table of the
referencing side, so no possibility of they being different in the
various referenced partitions' RI_ConstraintInfos.

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

BTW, one problem with Kuroda-san's patch is that it's using
conparentid as the shared key, which can still lead to multiple
queries being generated when using multi-level partitioning, because
there would be many (intermediate) parent constraints in that case.
We should really be using the "root" constraint OID as the key,
because there would only be one root from which all constraints in a
given partition hierarchy would've originated. Actually, I had
written a patch a few months back to do exactly that, a polished
version of which I've attached with this email. Please take a look.

I don't like that the function self-recurses but
ri_GetParentConstrOid() actually climbs up to the root constraint, so
the patch covers the multilevel partitioning cases.

Sorry, I got too easily distracted by the name Kuroda-san chose for
the field in RI_ConstraintInfo and the function.

About your patch, it calculates the root constrid at the time an
riinfo is created, but when the root-partition is further attached to
another partitioned-table after the riinfo creation,
constraint_root_id gets stale. Of course that dones't matter
practically, though.

Maybe we could also store the hash value of the root constraint OID as
rootHashValue and check for that one too in
InvalidateConstraintCacheCallBack(). That would take care of this
unless I'm missing something.

--
Amit Langote
EDB: http://www.enterprisedb.com

#14

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Langote (#13)

Re: Huge memory consumption on partitioned table with FKs

At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Hello,

On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Hmm, I don't see what the problem is here, because it's not
RI_ConstraintInfo that is being shared among partitions of the same
parent, but the RI query (and the SPIPlanPtr for it) that is issued
through SPI. The query (and the plan) turns out to be the same no
matter what partition's RI_ConstraintInfo is first used to generate
it. What am I missing?

I don't think you're missing something. As I (tried to..) mentioned in
another branch of this thread, you're right. On the otherhand
fk_attnums is actually used to generate the query, thus *issue*
potentially affects the query. Although that might be paranoic, that
possibility is eliminated by checking the congruency(?) of fk_attnums
(or other members). We might even be able to share a riinfo among
such children.

Just to be sure I've not misunderstood you, let me mention why I think
the queries used by different partitions all turn out to be the same
despite attribute number differences among partitions. The places
that uses fk_attnums when generating a query are these among others:

Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
...
Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
...
quoteOneName(attname,
RIAttName(fk_rel, riinfo->fk_attnums[i]));

For the queries on the referencing side ("check" side),
type/collation/attribute name determined using the above are going to
be the same for all partitions in a given tree irrespective of the
attribute number, because they're logically the same column. On the

Yes, I know that, which is what I meant by "practically" or
"actually", but it is not explicitly defined AFAICS.

Thus that would be no longer an issue if we explicitly define that
"When conpraentid stores a valid value, each element of fk_attnums
points to logically the same attribute with the RI_ConstraintInfo for
the parent constraint." Or I'd be happy if we have such a comment
there instead.

referenced side ("restrict", "cascade", "set" side), as you already
mentioned, fk_attnums refers to the top parent table of the
referencing side, so no possibility of they being different in the
various referenced partitions' RI_ConstraintInfos.

Right. (I'm not sure I have mention that here, though:p)A

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

Maybe just letting the hash entry for the child riinfo point to the
parent riinfo if all members (other than constraint_id, of course)
share the exactly the same values. No need to count references since
we don't going to remove riinfos.

BTW, one problem with Kuroda-san's patch is that it's using
conparentid as the shared key, which can still lead to multiple
queries being generated when using multi-level partitioning, because
there would be many (intermediate) parent constraints in that case.
We should really be using the "root" constraint OID as the key,
because there would only be one root from which all constraints in a
given partition hierarchy would've originated. Actually, I had
written a patch a few months back to do exactly that, a polished
version of which I've attached with this email. Please take a look.

I don't like that the function self-recurses but
ri_GetParentConstrOid() actually climbs up to the root constraint, so
the patch covers the multilevel partitioning cases.

Sorry, I got too easily distracted by the name Kuroda-san chose for
the field in RI_ConstraintInfo and the function.

I thought the same at the first look to the function.

About your patch, it calculates the root constrid at the time an
riinfo is created, but when the root-partition is further attached to
another partitioned-table after the riinfo creation,
constraint_root_id gets stale. Of course that dones't matter
practically, though.

Maybe we could also store the hash value of the root constraint OID as
rootHashValue and check for that one too in
InvalidateConstraintCacheCallBack(). That would take care of this
unless I'm missing something.

Seems to be sound.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#14)

Re: Huge memory consumption on partitioned table with FKs

On Thu, Dec 3, 2020 at 2:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I don't think you're missing something. As I (tried to..) mentioned in
another branch of this thread, you're right. On the otherhand
fk_attnums is actually used to generate the query, thus *issue*
potentially affects the query. Although that might be paranoic, that
possibility is eliminated by checking the congruency(?) of fk_attnums
(or other members). We might even be able to share a riinfo among
such children.

Just to be sure I've not misunderstood you, let me mention why I think
the queries used by different partitions all turn out to be the same
despite attribute number differences among partitions. The places
that uses fk_attnums when generating a query are these among others:

Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
...
Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
...
quoteOneName(attname,
RIAttName(fk_rel, riinfo->fk_attnums[i]));

For the queries on the referencing side ("check" side),
type/collation/attribute name determined using the above are going to
be the same for all partitions in a given tree irrespective of the
attribute number, because they're logically the same column. On the

Yes, I know that, which is what I meant by "practically" or
"actually", but it is not explicitly defined AFAICS.

Well, I think it's great that we don't have to worry *in this part of
the code* about partition's fk_attnums not being congruent with the
root parent's, because ensuring that is the responsibility of the
other parts of the system such as DDL. If we have any problems in
this area, they should be dealt with by ensuring that there are no
bugs in those other parts.

Thus that would be no longer an issue if we explicitly define that
"When conpraentid stores a valid value, each element of fk_attnums
points to logically the same attribute with the RI_ConstraintInfo for
the parent constraint." Or I'd be happy if we have such a comment
there instead.

I saw a comment in Kuroda-san's v2 patch that is perhaps meant to
address this point, but the placement needs to be reconsidered:

@@ -366,6 +368,14 @@ RI_FKey_check(TriggerData *trigdata)
        querysep = "WHERE";
        for (int i = 0; i < riinfo->nkeys; i++)
        {
+
+           /*
+           * We share the same plan among all relations in a partition
+           * hierarchy.  The plan is guaranteed to be compatible since all of
+           * the member relations are guaranteed to have the equivalent set
+           * of foreign keys in fk_attnums[].
+           */
+
            Oid         pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
            Oid         fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);

A more appropriate place for this kind of comment would be where
fk_attnums is defined or in ri_BuildQueryKey() that is shared by
different RI query issuing functions.

referenced side ("restrict", "cascade", "set" side), as you already
mentioned, fk_attnums refers to the top parent table of the
referencing side, so no possibility of they being different in the
various referenced partitions' RI_ConstraintInfos.

Right. (I'm not sure I have mention that here, though:p)A

Maybe I misread but I think you did in your email dated Dec 1 where you said:

"After an off-list discussion, we confirmed that even in that case the
patch works as is because fk_attnum (or contuple.conkey) always stores
key attnums compatible to the topmost parent when conparent has a
valid value (assuming the current usage of fk_attnum), but I still
feel uneasy to rely on that unclear behavior."

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

Maybe just letting the hash entry for the child riinfo point to the
parent riinfo if all members (other than constraint_id, of course)
share the exactly the same values. No need to count references since
we don't going to remove riinfos.

Ah, something maybe worth trying. Although the memory we'd save by
sharing the RI_ConstraintInfos would not add that much to the savings
we're having by sharing the plan, because it's the plans that are a
memory hog AFAIK.

About your patch, it calculates the root constrid at the time an
riinfo is created, but when the root-partition is further attached to
another partitioned-table after the riinfo creation,
constraint_root_id gets stale. Of course that dones't matter
practically, though.

Maybe we could also store the hash value of the root constraint OID as
rootHashValue and check for that one too in
InvalidateConstraintCacheCallBack(). That would take care of this
unless I'm missing something.

Seems to be sound.

Okay, thanks.

I have attached a patch in which I've tried to merge the ideas from
both my patch and Kuroda-san's. I liked that his patch added
conparentid to RI_ConstraintInfo because that saves a needless
syscache lookup for constraints that don't have a parent. I've kept
my idea to compute the root constraint id only once in
ri_LoadConstraint(), not on every invocation of ri_BuildQueryKey().
Kuroda-san, anything you'd like to add to that?

--
Amit Langote
EDB: http://www.enterprisedb.com

#16

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Langote (#15)

Re: Huge memory consumption on partitioned table with FKs

At Thu, 3 Dec 2020 16:41:45 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

On Thu, Dec 3, 2020 at 2:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
For the queries on the referencing side ("check" side),
type/collation/attribute name determined using the above are going to
be the same for all partitions in a given tree irrespective of the
attribute number, because they're logically the same column. On the

Yes, I know that, which is what I meant by "practically" or
"actually", but it is not explicitly defined AFAICS.

Well, I think it's great that we don't have to worry *in this part of
the code* about partition's fk_attnums not being congruent with the
root parent's, because ensuring that is the responsibility of the
other parts of the system such as DDL. If we have any problems in
this area, they should be dealt with by ensuring that there are no
bugs in those other parts.

Agreed.

Thus that would be no longer an issue if we explicitly define that
"When conpraentid stores a valid value, each element of fk_attnums
points to logically the same attribute with the RI_ConstraintInfo for
the parent constraint." Or I'd be happy if we have such a comment
there instead.

I saw a comment in Kuroda-san's v2 patch that is perhaps meant to
address this point, but the placement needs to be reconsidered:

Ah, yes, that comes from my proposal.

@@ -366,6 +368,14 @@ RI_FKey_check(TriggerData *trigdata)
querysep = "WHERE";
for (int i = 0; i < riinfo->nkeys; i++)
{
+
+           /*
+           * We share the same plan among all relations in a partition
+           * hierarchy.  The plan is guaranteed to be compatible since all of
+           * the member relations are guaranteed to have the equivalent set
+           * of foreign keys in fk_attnums[].
+           */
+
Oid         pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
Oid         fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);

A more appropriate place for this kind of comment would be where
fk_attnums is defined or in ri_BuildQueryKey() that is shared by
different RI query issuing functions.

Yeah, I wanted more appropriate place for the comment. That place
seems reasonable.

referenced side ("restrict", "cascade", "set" side), as you already
mentioned, fk_attnums refers to the top parent table of the
referencing side, so no possibility of they being different in the
various referenced partitions' RI_ConstraintInfos.

Right. (I'm not sure I have mention that here, though:p)A

Maybe I misread but I think you did in your email dated Dec 1 where you said:

"After an off-list discussion, we confirmed that even in that case the
patch works as is because fk_attnum (or contuple.conkey) always stores
key attnums compatible to the topmost parent when conparent has a
valid value (assuming the current usage of fk_attnum), but I still
feel uneasy to rely on that unclear behavior."

fk_attnums *doesn't* refers to to top parent talbe of the referencing
side. it refers to attributes of the partition that is compatible with
the same element of fk_attnums of the topmost parent. Maybe I'm
misreading.

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

Maybe just letting the hash entry for the child riinfo point to the
parent riinfo if all members (other than constraint_id, of course)
share the exactly the same values. No need to count references since
we don't going to remove riinfos.

Ah, something maybe worth trying. Although the memory we'd save by
sharing the RI_ConstraintInfos would not add that much to the savings
we're having by sharing the plan, because it's the plans that are a
memory hog AFAIK.

I agree that plans are rather large but the sharable part of the
RI_ConstraintInfos is 536 bytes, I'm not sure it is small enough
comparing to the plans. But that has somewhat large footprint.. (See
the attached)

About your patch, it calculates the root constrid at the time an
riinfo is created, but when the root-partition is further attached to
another partitioned-table after the riinfo creation,
constraint_root_id gets stale. Of course that dones't matter
practically, though.

Maybe we could also store the hash value of the root constraint OID as
rootHashValue and check for that one too in
InvalidateConstraintCacheCallBack(). That would take care of this
unless I'm missing something.

Seems to be sound.

Okay, thanks.

I have attached a patch in which I've tried to merge the ideas from
both my patch and Kuroda-san's. I liked that his patch added
conparentid to RI_ConstraintInfo because that saves a needless
syscache lookup for constraints that don't have a parent. I've kept
my idea to compute the root constraint id only once in
ri_LoadConstraint(), not on every invocation of ri_BuildQueryKey().
Kuroda-san, anything you'd like to add to that?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#17

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#16)

Re: Huge memory consumption on partitioned table with FKs

At Thu, 03 Dec 2020 17:13:16 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
me> I agree that plans are rather large but the sharable part of the
me> RI_ConstraintInfos is 536 bytes, I'm not sure it is small enough
me> comparing to the plans. But that has somewhat large footprint.. (See
me> the attached)

0001 contains a bug about query_key and get_ri_constaint_root (from
your patch) is not needed there, but the core part is 0002 so please
ignore them.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#16)

Re: Huge memory consumption on partitioned table with FKs

On Thu, Dec 3, 2020 at 5:13 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 3 Dec 2020 16:41:45 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Maybe I misread but I think you did in your email dated Dec 1 where you said:

"After an off-list discussion, we confirmed that even in that case the
patch works as is because fk_attnum (or contuple.conkey) always stores
key attnums compatible to the topmost parent when conparent has a
valid value (assuming the current usage of fk_attnum), but I still
feel uneasy to rely on that unclear behavior."

fk_attnums *doesn't* refers to to top parent talbe of the referencing
side. it refers to attributes of the partition that is compatible with
the same element of fk_attnums of the topmost parent. Maybe I'm
misreading.

Yeah, no I am confused. Reading what I wrote, it seems I implied that
the referenced (PK relation's) partitions have RI_ConstraintInfo which
makes no sense, although there indeed is one pg_constraint entry that
is defined on the FK root table for every PK partition with its OID as
confrelid, which is in addition to an entry containing the root PK
table's OID as confrelid. I confused those PK-partition-referencing
entries as belonging to the partitions themselves. Although in my
defence, all of those entries' conkey contains the FK root table's
attributes, so at least that much holds. :)

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

Maybe just letting the hash entry for the child riinfo point to the
parent riinfo if all members (other than constraint_id, of course)
share the exactly the same values. No need to count references since
we don't going to remove riinfos.

Ah, something maybe worth trying. Although the memory we'd save by
sharing the RI_ConstraintInfos would not add that much to the savings
we're having by sharing the plan, because it's the plans that are a
memory hog AFAIK.

I agree that plans are rather large but the sharable part of the
RI_ConstraintInfos is 536 bytes, I'm not sure it is small enough
comparing to the plans. But that has somewhat large footprint.. (See
the attached)

Thanks for the patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

#19

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Corey Huinker (#8)

Re: Huge memory consumption on partitioned table with FKs

On Tue, Dec 1, 2020 at 12:25 PM Corey Huinker <corey.huinker@gmail.com> wrote:

On Mon, Nov 30, 2020 at 9:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.

Yeah, maybe. Certainly ri_triggers is putting a huge amount of effort
into working around the SPI/parser/planner layer, to not a lot of gain.

However, it's not clear to me that that line of thought will work well
for the statement-level-trigger approach. In that case you might be
dealing with enough tuples to make a different plan advisable.

Bypassing SPI would probably mean that we stay with row level triggers, and the cached query plan would go away, perhaps replaced by an already-looked-up-this-tuple hash sorta like what the cached nested loops effort is doing.

I've been meaning to give this a try when I got some spare time. This may inspire me to try again.

+1 for this line of work.

--
Amit Langote
EDB: http://www.enterprisedb.com

#20

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#16)

Re: Huge memory consumption on partitioned table with FKs

Hello

I haven't followed this thread's latest posts, but I'm unclear on the
lifetime of the new struct that's being allocated in TopMemoryContext.
At what point are those structs freed?

Also, the comment that was in RI_ConstraintInfo now appears in
RI_ConstraintParam, and the new struct (RI_ConstraintInfo) is now
undocumented. What is the relationship between those two structs? I
see that they have pointers to each other, but I think the relationship
should be documented more clearly.

Thanks!

#21

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Langote (#18)

#22

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Amit Langote (#15)

#23

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#20)

#24

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Keisuke Kuroda (#22)

#25

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#24)

#26

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#23)

#27

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Amit Langote (#26)

#28

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Alvaro Herrera (#27)

#29

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Langote (#28)

#30

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Kyotaro Horiguchi (#29)

#31

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Amit Langote (#26)

#32

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Amit Langote (#31)

#33

Langote_Amit_f8@lab.ntt.co.jp

over 5 years ago

In reply to: Keisuke Kuroda (#32)

#34

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Amit Langote (#33)

#35

David Steele

david@pgmasters.net

about 5 years ago

In reply to: Amit Langote (#30)

#36

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: David Steele (#35)

#37

alvherre@2ndquadrant.com

about 5 years ago

In reply to: Amit Langote (#36)

#38

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Amit Langote (#25)

#39

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: Tom Lane (#38)

#40

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Tom Lane (#38)

#41

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Andy Fan (#40)

#42

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: Andy Fan (#41)

#43

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Amit Langote (#42)

#44

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: Andy Fan (#43)

#45

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Amit Langote (#39)

#46

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: Tom Lane (#45)

#47

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Amit Langote (#46)

#48

Langote_Amit_f8@lab.ntt.co.jp

about 5 years ago

In reply to: Tom Lane (#47)

#49

keisuke.kuroda.3862@gmail.com

about 5 years ago

In reply to: Amit Langote (#48)

#50

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 5 years ago

In reply to: Amit Langote (#48)

#51

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Tatsuro Yamada (#50)

#52