extended stats on partitioned tables

Started by Justin Pryzbyover 4 years ago37 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

extended stats objects are allowed on partitioned tables since v10.
/messages/by-id/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
| Don't build extended statistics on inheritance trees
...
| Moreover, the current selectivity estimation code only works with individual
| relations, so building statistics on inheritance trees would be pointless
| anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
| HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1)

Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself. I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again). Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.

Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#1)
Re: extended stats on partitioned tables

On 9/23/21 11:26 PM, Justin Pryzby wrote:

extended stats objects are allowed on partitioned tables since v10.
/messages/by-id/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
| Don't build extended statistics on inheritance trees
...
| Moreover, the current selectivity estimation code only works with individual
| relations, so building statistics on inheritance trees would be pointless
| anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
| HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1)

Agreed, that seems like a regression, but I don't see how to fix that
without having the extra flag in the catalog. Otherwise we can store
just one version for each statistics object :-(

Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself. I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again). Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.

Yeah, we probably need the flag - I planned to get it into 14, but then
I got distracted by something else :-/

Attached is a PoC that I quickly bashed together today. It's pretty raw,
but it passed "make check" and I think it does most of the things right.
Can you try if this fixes the estimates with partitioned tables?

Extended statistics use two catalogs, pg_statistic_ext for definition,
while pg_statistic_ext_data stores the built statistics objects - the
flag needs to be in the "data" catalog, and managing the records is a
bit challenging - the current PoC code mostly works, but I had to relax
some error checks and I'm sure there are cases when we fail to remove a
row, or something like that.

Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).

Yeah. I think it'd be helpful to assemble a script with various test
cases demonstrating how we estimate various cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

statistics-inheritance-fix-v1.patchtext/x-patch; charset=UTF-8; name=statistics-inheritance-fix-v1.patchDownload+219-142
#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#2)
Re: extended stats on partitioned tables

On Sat, Sep 25, 2021 at 09:27:10PM +0200, Tomas Vondra wrote:

On 9/23/21 11:26 PM, Justin Pryzby wrote:

extended stats objects are allowed on partitioned tables since v10.
/messages/by-id/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
| Don't build extended statistics on inheritance trees
...
| Moreover, the current selectivity estimation code only works with individual
| relations, so building statistics on inheritance trees would be pointless
| anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
| HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

...

Agreed, that seems like a regression, but I don't see how to fix that
without having the extra flag in the catalog. Otherwise we can store just
one version for each statistics object :-(

Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children. Since partitioned tables have no data
themselves, the !inh check could be relaxed. It's not totally clear to me if
the correct statistics would be used in that case. I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)

Attached is a PoC that I quickly bashed together today. It's pretty raw, but
it passed "make check" and I think it does most of the things right. Can you
try if this fixes the estimates with partitioned tables?

I think pg_stats_ext_exprs also needs to expose the inherited flag.

Thanks,
--
Justin

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#3)
Re: extended stats on partitioned tables

On 9/25/21 9:53 PM, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 09:27:10PM +0200, Tomas Vondra wrote:

On 9/23/21 11:26 PM, Justin Pryzby wrote:

extended stats objects are allowed on partitioned tables since v10.
/messages/by-id/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
| Don't build extended statistics on inheritance trees
...
| Moreover, the current selectivity estimation code only works with individual
| relations, so building statistics on inheritance trees would be pointless
| anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
| HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

...

Agreed, that seems like a regression, but I don't see how to fix that
without having the extra flag in the catalog. Otherwise we can store just
one version for each statistics object :-(

Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children. Since partitioned tables have no data
themselves, the !inh check could be relaxed. It's not totally clear to me if
the correct statistics would be used in that case. I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)

Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure
we never build stats with and without inheritance at the same time (for
the same rel). The 859b3003de ensures that by only building extended
stats in the (!inh) case, but we might tweak that based on relkind. See
the attached patch. But I wonder if there are cases that might be hurt
by this - that'd be a regression too, of course.

Attached is a PoC that I quickly bashed together today. It's pretty raw, but
it passed "make check" and I think it does most of the things right. Can you
try if this fixes the estimates with partitioned tables?

I think pg_stats_ext_exprs also needs to expose the inherited flag.

Yeah, I only did the bare minimum to get the PoC working. I'm sure there
are various other loose ends.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

inheritance-fix-simple.patchtext/x-patch; charset=UTF-8; name=inheritance-fix-simple.patchDownload+4-1
#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#4)
Re: extended stats on partitioned tables

On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:

Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children. Since partitioned tables have no data
themselves, the !inh check could be relaxed. It's not totally clear to me if
the correct statistics would be used in that case. I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)

Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
never build stats with and without inheritance at the same time (for the
same rel). The 859b3003de ensures that by only building extended stats in
the (!inh) case, but we might tweak that based on relkind. See the attached
patch. But I wonder if there are cases that might be hurt by this - that'd
be a regression too, of course.

I think we should leave the inheritance case alone, since it hasn't changed in
2 years, and building stats on the table ONLY is a legitimate interpretation,
and it's as good as we can do without the catalog change.

But the partitioned case used to work, and there's no utility in selecting FROM
ONLY a partitioned table, so we might as well build the stats including its
partitions.

I don't think anything would get worse for the partitioned case.
Obviously building inherited ext stats could change plans - that's the point.
It's weird that the stats objects which existed for 18 months before being
"built" after the patch was applied, but no so weird that the release notes
wouldn't be ample documentation.

If building statistics caused the plan to change undesirably, the solution
would be to drop the stats object, of course.

+ build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);

It's weird to build inherited extended stats for partitioned tables but not for
inheritence parents. We could be clever and say "build inherited ext stats for
inheritence parents only if we didn't insert any stats for the table itself
(because it's empty)". But I think that's fragile: a single tuple in the
parent table could cause stats to be built there instead of on its heirarchy,
and the extended stats would be used for *both* FROM and FROM ONLY, which is an
awful combination.

Since do_analyze_rel is only called once for partitioned tables, I think you
could write that as:

/* Do not build inherited stats (since the catalog cannot support it) except
* for partitioned tables, for which numrows==0 and have no non-inherited stats */
build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;

--
Justin

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#2)
Re: extended stats on partitioned tables

(Resending with -hackers)

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

Which leads to another issue in stable branches:

ANALYZE builds only non-inherited stats, but they're incorrectly used for
inherited queries - the rowcount estimate is worse on inheritence parents with
extended stats than without.

CREATE TABLE p(i int, j int);
CREATE TABLE p1() INHERITS(p);
INSERT INTO p SELECT a, a/10 FROM generate_series(1,9)a;
INSERT INTO p1 SELECT a, a FROM generate_series(1,999)a;
CREATE STATISTICS ps ON i,j FROM p;
VACUUM ANALYZE p,p1;

postgres=# explain analyze SELECT * FROM p GROUP BY 1,2;
HashAggregate (cost=26.16..26.25 rows=9 width=8) (actual time=2.571..3.282 rows=1008 loops=1)

postgres=# begin; DROP STATISTICS ps; explain analyze SELECT * FROM p GROUP BY 1,2; rollback;
HashAggregate (cost=26.16..36.16 rows=1000 width=8) (actual time=2.167..2.872 rows=1008 loops=1)

I guess examine_variable() should have corresponding logic to the hardcoded
!inh in analyze.c.

--
Justin

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#5)
Re: extended stats on partitioned tables

On 9/25/21 11:46 PM, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:

Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children. Since partitioned tables have no data
themselves, the !inh check could be relaxed. It's not totally clear to me if
the correct statistics would be used in that case. I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)

Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
never build stats with and without inheritance at the same time (for the
same rel). The 859b3003de ensures that by only building extended stats in
the (!inh) case, but we might tweak that based on relkind. See the attached
patch. But I wonder if there are cases that might be hurt by this - that'd
be a regression too, of course.

I think we should leave the inheritance case alone, since it hasn't changed in
2 years, and building stats on the table ONLY is a legitimate interpretation,
and it's as good as we can do without the catalog change.

But the partitioned case used to work, and there's no utility in selecting FROM
ONLY a partitioned table, so we might as well build the stats including its
partitions.

I don't think anything would get worse for the partitioned case.
Obviously building inherited ext stats could change plans - that's the point.
It's weird that the stats objects which existed for 18 months before being
"built" after the patch was applied, but no so weird that the release notes
wouldn't be ample documentation.

Agreed.

If building statistics caused the plan to change undesirably, the solution
would be to drop the stats object, of course.

+ build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);

It's weird to build inherited extended stats for partitioned tables but not for
inheritence parents. We could be clever and say "build inherited ext stats for
inheritence parents only if we didn't insert any stats for the table itself
(because it's empty)". But I think that's fragile: a single tuple in the
parent table could cause stats to be built there instead of on its heirarchy,
and the extended stats would be used for *both* FROM and FROM ONLY, which is an
awful combination.

I don't think there's a good way to check if there are any rows in the
parent relation. And even then, a single row might cause huge changes to
query plans (essentially switching to very different stats).

Since do_analyze_rel is only called once for partitioned tables, I think you
could write that as:

/* Do not build inherited stats (since the catalog cannot support it) except
* for partitioned tables, for which numrows==0 and have no non-inherited stats */
build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;

Good point.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#6)
Re: extended stats on partitioned tables

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

+       /* create only the "stxdinherit=false", because that always exists */
+       datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = ObjectIdGetDatum(false);

That'd be confusing for partitioned tables, no?
They'd always have an row with no data.
I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE).
(not ObjectIdGetDatum).
Then, that affects the loops which delete the tuples - neither inh nor !inh is
guaranteed, unless you check relkind there, too.

BTW, you'd need to add an "inherited" column to \dX if you added the "built"
data back.

Also, I think in backbranches we should document what's being stored in
pg_statistic_ext, since it's pretty unintuitive:
- noninherted stats (FROM ONLY) for inheritence parents;
- inherted stats (FROM *) for partitioned tables;

I think the !inh decision in 859b3003de was basically backwards.
I think it'd be rare for someone to put extended stats on a parent for
improving plans involving FROM ONLY.

But it's not worth trying to fix now, since it would change plans in
irreversible ways. Also, if the stx data were already populated, users would
have to run a manual analyze after upgrading to populate the catalog with the
data the planner would expect in the new version, or else it would end up being
the opposite of the issue I mentioned: non-inherited stats (from before the
upgrade) would be applied by the planner (after the upgrade) to inherited
queries.

Attachments:

0004-f-check-inh.patchtext/x-diff; charset=us-asciiDownload+23-34
0005-Refactor-parent-ACL-check.patchtext/x-diff; charset=us-asciiDownload+52-89
0001-Do-not-use-extended-statistics-on-inheritence-trees.patchtext/x-diff; charset=us-asciiDownload+56-1
0002-Build-inherited-extended-stats-on-partitioned-tables.patchtext/x-diff; charset=us-asciiDownload+41-7
0003-Add-stxdinherit-build-inherited-extended-stats-on-in.patchtext/x-diff; charset=us-asciiDownload+217-146
#9Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Justin Pryzby (#8)
Re: extended stats on partitioned tables

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

Actually, this is confusing. Which patch is the one we should be
reviewing?

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Jaime Casanova (#9)
Re: extended stats on partitioned tables

On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

Actually, this is confusing. Which patch is the one we should be
reviewing?

It is confusing, but not as much as I first thought. Please check the commit
messages.

The first two patches are meant to be applied to master *and* backpatched. The
first one intends to fixes the bug that non-inherited stats are being used for
queries of inheritance trees. The 2nd one fixes the regression that stats are
not collected for inheritence trees of partitioned tables (which is the only
type of stats they could ever possibly have).

And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
inherited and non-inherited stats, only in master, since it requires a catalog
change. It's a bit confusing that patch #4 removes most what I added in
patches 1 and 2. But that's exactly what's needed to collect and apply both
inherited and non-inherited stats: the first two patches avoid applying stats
collected with the wrong inheritence. That's also what's needed for the
patchset to follow the normal "apply to master and backpatch" process, rather
than 2 patches which are backpatched but not applied to master, and one which
is applied to master and not backpatched..

@Tomas: I just found commit 427c6b5b9, which is a remarkably similar issue
affecting column stats 15 years ago.

Rebased since there were conflicts with my typos fixes.

--
Justin

Attachments:

v2-0001-Do-not-use-extended-statistics-on-inheritence-tre.patchtext/x-diff; charset=us-asciiDownload+56-1
v2-0002-Build-inherited-extended-stats-on-partitioned-tab.patchtext/x-diff; charset=us-asciiDownload+41-7
v2-0003-Add-stxdinherit-build-inherited-extended-stats-on.patchtext/x-diff; charset=us-asciiDownload+217-146
v2-0004-f-check-inh.patchtext/x-diff; charset=us-asciiDownload+23-34
v2-0005-Refactor-parent-ACL-check.patchtext/x-diff; charset=us-asciiDownload+52-89
#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#10)
Re: extended stats on partitioned tables

On 10/8/21 12:45 AM, Justin Pryzby wrote:

On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

Actually, this is confusing. Which patch is the one we should be
reviewing?

It is confusing, but not as much as I first thought. Please check the commit
messages.

The first two patches are meant to be applied to master *and* backpatched. The
first one intends to fixes the bug that non-inherited stats are being used for
queries of inheritance trees. The 2nd one fixes the regression that stats are
not collected for inheritence trees of partitioned tables (which is the only
type of stats they could ever possibly have).

I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
the (!rte->inh) check in the rel->statlist loops. AFAICS both places
could do that right at the beginning, because it does not depend on the
statistics object at all, just the RelOptInfo.

And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
inherited and non-inherited stats, only in master, since it requires a catalog
change. It's a bit confusing that patch #4 removes most what I added in
patches 1 and 2. But that's exactly what's needed to collect and apply both
inherited and non-inherited stats: the first two patches avoid applying stats
collected with the wrong inheritence. That's also what's needed for the
patchset to follow the normal "apply to master and backpatch" process, rather
than 2 patches which are backpatched but not applied to master, and one which
is applied to master and not backpatched..

Yeah. Af first I was a bit confused because after applying 0003 there
are both the fixes and the "correct" way, but then I realized 0004
removes the unnecessary bits.

The one thing 0003 still needs is to rework the places that need to
touch both inh and !inh stats. The patch simply does

for (inh = 0; inh <= 1; inh++) { ... }

but that feels a bit too hackish. But if we don't know which of the two
stats exist, I'm not sure what to do about it. And I'm not sure we do
the right thing after removing children, for example (that should drop
the inheritance stats, I guess).

The 1:2 mapping between pg_statistic_ext and pg_statistic_ext_data is a
bit strange, but I can't think of a better way.

@Tomas: I just found commit 427c6b5b9, which is a remarkably similar issue
affecting column stats 15 years ago.

What can I say? The history repeats itself ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#11)
Re: extended stats on partitioned tables

On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote:

On 10/8/21 12:45 AM, Justin Pryzby wrote:

On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

Actually, this is confusing. Which patch is the one we should be
reviewing?

It is confusing, but not as much as I first thought. Please check the commit
messages.

The first two patches are meant to be applied to master *and* backpatched. The
first one intends to fixes the bug that non-inherited stats are being used for
queries of inheritance trees. The 2nd one fixes the regression that stats are
not collected for inheritence trees of partitioned tables (which is the only
type of stats they could ever possibly have).

I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
the (!rte->inh) check in the rel->statlist loops. AFAICS both places
could do that right at the beginning, because it does not depend on the
statistics object at all, just the RelOptInfo.

I probably did this to make the code change small, to avoid indentin the whole
block.

And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
inherited and non-inherited stats, only in master, since it requires a catalog
change. It's a bit confusing that patch #4 removes most what I added in
patches 1 and 2. But that's exactly what's needed to collect and apply both
inherited and non-inherited stats: the first two patches avoid applying stats
collected with the wrong inheritence. That's also what's needed for the
patchset to follow the normal "apply to master and backpatch" process, rather
than 2 patches which are backpatched but not applied to master, and one which
is applied to master and not backpatched..

Yeah. Af first I was a bit confused because after applying 0003 there
are both the fixes and the "correct" way, but then I realized 0004
removes the unnecessary bits.

This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my
changes. They should be squished together.

The one thing 0003 still needs is to rework the places that need to
touch both inh and !inh stats. The patch simply does

for (inh = 0; inh <= 1; inh++) { ... }

but that feels a bit too hackish. But if we don't know which of the two
stats exist, I'm not sure what to do about it.

There's also this:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

+       /* create only the "stxdinherit=false", because that always exists */
+       datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = ObjectIdGetDatum(false);

That'd be confusing for partitioned tables, no?
They'd always have an row with no data.
I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE).
(not ObjectIdGetDatum).
Then, that affects the loops which delete the tuples - neither inh nor !inh is
guaranteed, unless you check relkind there, too.

Maybe the for inh<=1 loop should instead be two calls to new functions factored
out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
inh".

And I'm not sure we do the right thing after removing children, for example
(that should drop the inheritance stats, I guess).

Do you mean for inheritance only ? Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions, and since
it does no harm.

#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#12)
Re: extended stats on partitioned tables

On 11/4/21 12:19 AM, Justin Pryzby wrote:

On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote:

On 10/8/21 12:45 AM, Justin Pryzby wrote:

On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

I tried adding that - I mostly kept my patches separate.
Hopefully this is more helpful than a complication.
I added at: https://commitfest.postgresql.org/35/3332/

Actually, this is confusing. Which patch is the one we should be
reviewing?

It is confusing, but not as much as I first thought. Please check the commit
messages.

The first two patches are meant to be applied to master *and* backpatched. The
first one intends to fixes the bug that non-inherited stats are being used for
queries of inheritance trees. The 2nd one fixes the regression that stats are
not collected for inheritence trees of partitioned tables (which is the only
type of stats they could ever possibly have).

I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
the (!rte->inh) check in the rel->statlist loops. AFAICS both places
could do that right at the beginning, because it does not depend on the
statistics object at all, just the RelOptInfo.

I probably did this to make the code change small, to avoid indentin the whole
block.

But indenting the block is not necessary. It's possible to do something
like this:

if (!rel->inh)
return 1.0;

or whatever is the "default" result for that function.

And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
inherited and non-inherited stats, only in master, since it requires a catalog
change. It's a bit confusing that patch #4 removes most what I added in
patches 1 and 2. But that's exactly what's needed to collect and apply both
inherited and non-inherited stats: the first two patches avoid applying stats
collected with the wrong inheritence. That's also what's needed for the
patchset to follow the normal "apply to master and backpatch" process, rather
than 2 patches which are backpatched but not applied to master, and one which
is applied to master and not backpatched..

Yeah. Af first I was a bit confused because after applying 0003 there
are both the fixes and the "correct" way, but then I realized 0004
removes the unnecessary bits.

This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my
changes. They should be squished together.

Yep.

The one thing 0003 still needs is to rework the places that need to
touch both inh and !inh stats. The patch simply does

for (inh = 0; inh <= 1; inh++) { ... }

but that feels a bit too hackish. But if we don't know which of the two
stats exist, I'm not sure what to do about it.

There's also this:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:

+       /* create only the "stxdinherit=false", because that always exists */
+       datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = ObjectIdGetDatum(false);

That'd be confusing for partitioned tables, no?
They'd always have an row with no data.
I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE).
(not ObjectIdGetDatum).
Then, that affects the loops which delete the tuples - neither inh nor !inh is
guaranteed, unless you check relkind there, too.

Maybe the for inh<=1 loop should instead be two calls to new functions factored
out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
inh".

Well, yeah. That's part of the strange 1:2 mapping between the stats
definition and data. Although, even with regular stats we have such
mapping, except the "definition" is the pg_attribute row.

And I'm not sure we do the right thing after removing children, for example
(that should drop the inheritance stats, I guess).

Do you mean for inheritance only ? Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions, and since
it does no harm.

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#13)
Re: extended stats on partitioned tables

On Thu, Nov 04, 2021 at 12:44:45AM +0100, Tomas Vondra wrote:

I probably did this to make the code change small, to avoid indentin the whole
block.

But indenting the block is not necessary. It's possible to do something
like this:

if (!rel->inh)
return 1.0;

or whatever is the "default" result for that function.

You're right. I did like that, Except in examine_variable, which already does
it with "break".

Maybe the for inh<=1 loop should instead be two calls to new functions factored
out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
inh".

I did like that in a separate patch for now.
And I avoided making a !inh tuple for partitioned tables, since they're never
populated.

And I'm not sure we do the right thing after removing children, for example
(that should drop the inheritance stats, I guess).

Do you mean for inheritance only ? Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions, and since
it does no harm.

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.

This works for column stats the way I proposed for extended stats: child stats
are never removed, neither when the only child is dropped, nor when re-running
ANALYZE (actually, that part is odd).

I can stop sending patches if it makes it hard to reconcile, but I wanted to
put it "on paper" to see/show what the patch series would look like, for v15
and back branches.

--
Justin

Attachments:

v3-0001-Do-not-use-extended-statistics-on-inheritence-tre.patchtext/x-diff; charset=us-asciiDownload+56-1
v3-0002-Build-inherited-extended-stats-on-partitioned-tab.patchtext/x-diff; charset=us-asciiDownload+41-7
v3-0003-Add-stxdinherit-build-inherited-extended-stats-on.patchtext/x-diff; charset=us-asciiDownload+215-151
v3-0004-f-check-inh.patchtext/x-diff; charset=us-asciiDownload+24-28
v3-0005-Maybe-better-than-looping-twice.-For-partitioned-.patchtext/x-diff; charset=us-asciiDownload+151-140
v3-0005-Maybe-better-way-than-looping-twice.-Change-parti.patchtext/x-diff; charset=us-asciiDownload+151-140
v3-0006-Refactor-parent-ACL-check.patchtext/x-diff; charset=us-asciiDownload+52-89
#15Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#13)
Re: extended stats on partitioned tables

On Thu, Nov 04, 2021 at 12:44:45AM +0100, Tomas Vondra wrote:

And I'm not sure we do the right thing after removing children, for example
(that should drop the inheritance stats, I guess).

Do you mean for inheritance only ? Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions, and since
it does no harm.

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.

That works for column stats the way I proposed for extended stats: child stats
are never removed, neither when the only child is dropped, nor when re-running
analyze (that part is actually a bit odd).

Rebased, fixing an intermediate compile error, and typos in the commit message.

--
Justin

Attachments:

0001-Do-not-use-extended-statistics-on-inheritance-trees.patchtext/x-diff; charset=us-asciiDownload+58-1
0002-Build-inherited-extended-stats-on-partitioned-tables.patchtext/x-diff; charset=us-asciiDownload+41-7
0003-Add-stxdinherit-build-inherited-extended-stats-on-in.patchtext/x-diff; charset=us-asciiDownload+215-150
0004-f-check-inh.patchtext/x-diff; charset=us-asciiDownload+21-28
0005-Maybe-better-than-looping-twice.-For-partitioned-tab.patchtext/x-diff; charset=us-asciiDownload+151-140
0006-Refactor-parent-ACL-check.patchtext/x-diff; charset=us-asciiDownload+52-89
#16Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#15)
Re: extended stats on partitioned tables

On Thu, Dec 2, 2021 at 9:24 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Thu, Nov 04, 2021 at 12:44:45AM +0100, Tomas Vondra wrote:

And I'm not sure we do the right thing after removing children, for

example

(that should drop the inheritance stats, I guess).

Do you mean for inheritance only ? Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions,

and since

it does no harm.

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.

That works for column stats the way I proposed for extended stats: child
stats
are never removed, neither when the only child is dropped, nor when
re-running
analyze (that part is actually a bit odd).

Rebased, fixing an intermediate compile error, and typos in the commit
message.

--
Justin

Hi,

+       if (!HeapTupleIsValid(tup)) /* should not happen */
+           // elog(ERROR, "cache lookup failed for statistics data %u",
statsOid);

You may want to remove commented out code.

+           for (i = 0; i < staForm->stxkeys.dim1; i++)
+               keys = bms_add_member(keys, staForm->stxkeys.values[i]);

Since the above code is in a loop now, should keys be cleared across the
outer loop iterations ?

Cheers

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Yu (#16)
Re: extended stats on partitioned tables

Hi,

Attached is a rebased and cleaned-up version of these patches, with more
comments, refactorings etc. Justin and Zhihong, can you take a look?

0001 - Ignore extended statistics for inheritance trees

0002 - Build inherited extended stats on partitioned tables

Those are mostly just Justin's patches, with more detailed comments and
updated commit message. I've considered moving the rel->inh check to
statext_clauselist_selectivity, and then removing the check from
dependencies and MCV. But I decided no to do that, because someone might
be calling those functions directly (even if that's very unlikely).

The one thing bugging me a bit is that the regression test checks only a
GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use per-partitions
stats.

0003 - Add stxdinherit flag to pg_statistic_ext_data

This is the patch for master, allowing to build stats for both inherits
flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
how we deal with iterating both flags etc. I've adopted most of the
Justin's fixup patches, except that in plancat.c I've refactored how we
load the stats to process keys/expressions just once.

It has the same issue with regression test using just a GROUP BY query,
but if we add a test to 0001/0002, that'll fix this too.

0004 - Refactor parent ACL check

Not sure about this - I doubt saving 30 rows in an 8kB file is really
worth it. Maybe it is, but then maybe we should try cleaning up the
other ACL checks in this file too? Seems mostly orthogonal to this
thread, though.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0004-Refactor-parent-ACL-check-20211212.patchtext/x-patch; charset=UTF-8; name=0004-Refactor-parent-ACL-check-20211212.patchDownload+52-89
0003-Add-stxdinherit-flag-to-pg_statistic_ext_da-20211212.patchtext/x-patch; charset=UTF-8; name=0003-Add-stxdinherit-flag-to-pg_statistic_ext_da-20211212.patchDownload+251-213
0002-Build-inherited-extended-stats-on-partition-20211212.patchtext/x-patch; charset=UTF-8; name=0002-Build-inherited-extended-stats-on-partition-20211212.patchDownload+70-16
0001-Ignore-extended-statistics-for-inheritance--20211212.patchtext/x-patch; charset=UTF-8; name=0001-Ignore-extended-statistics-for-inheritance--20211212.patchDownload+73-1
#18Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#17)
Re: extended stats on partitioned tables

On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

Attached is a rebased and cleaned-up version of these patches, with more
comments, refactorings etc. Justin and Zhihong, can you take a look?

0001 - Ignore extended statistics for inheritance trees

0002 - Build inherited extended stats on partitioned tables

Those are mostly just Justin's patches, with more detailed comments and
updated commit message. I've considered moving the rel->inh check to
statext_clauselist_selectivity, and then removing the check from
dependencies and MCV. But I decided no to do that, because someone might
be calling those functions directly (even if that's very unlikely).

The one thing bugging me a bit is that the regression test checks only a
GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use per-partitions
stats.

0003 - Add stxdinherit flag to pg_statistic_ext_data

This is the patch for master, allowing to build stats for both inherits
flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
how we deal with iterating both flags etc. I've adopted most of the
Justin's fixup patches, except that in plancat.c I've refactored how we
load the stats to process keys/expressions just once.

It has the same issue with regression test using just a GROUP BY query,
but if we add a test to 0001/0002, that'll fix this too.

0004 - Refactor parent ACL check

Not sure about this - I doubt saving 30 rows in an 8kB file is really
worth it. Maybe it is, but then maybe we should try cleaning up the
other ACL checks in this file too? Seems mostly orthogonal to this
thread, though.

Hi,

For patch 3, in commit message:

and there no clear winner. -> and there is no clear winner.

and it seem wasteful -> and it seems wasteful

The there may be -> There may be

+       /* skip statistics with mismatching stxdinherit value */
+       if (stat->inherit != rte->inh)

Should a log be added for the above case ?

+ * Determine if we'redealing with inheritance tree.

There should be a space between re and dealing.

Cheers

regards

Show quoted text

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Yu (#18)
Re: extended stats on partitioned tables

On 12/12/21 05:38, Zhihong Yu wrote:

On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

Hi,

Attached is a rebased and cleaned-up version of these patches, with more
comments, refactorings etc. Justin and Zhihong, can you take a look?

0001 - Ignore extended statistics for inheritance trees

0002 - Build inherited extended stats on partitioned tables

Those are mostly just Justin's patches, with more detailed comments and
updated commit message. I've considered moving the rel->inh check to
statext_clauselist_selectivity, and then removing the check from
dependencies and MCV. But I decided no to do that, because someone might
be calling those functions directly (even if that's very unlikely).

The one thing bugging me a bit is that the regression test checks only a
GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use per-partitions
stats.

0003 - Add stxdinherit flag to pg_statistic_ext_data

This is the patch for master, allowing to build stats for both inherits
flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
how we deal with iterating both flags etc. I've adopted most of the
Justin's fixup patches, except that in plancat.c I've refactored how we
load the stats to process keys/expressions just once.

It has the same issue with regression test using just a GROUP BY query,
but if we add a test to 0001/0002, that'll fix this too.

0004 - Refactor parent ACL check

Not sure about this - I doubt saving 30 rows in an 8kB file is really
worth it. Maybe it is, but then maybe we should try cleaning up the
other ACL checks in this file too? Seems mostly orthogonal to this
thread, though.

Hi,
For patch 3, in commit message:

and there no clear winner. -> and there is no clear winner. 

and it seem wasteful -> and it seems wasteful

The there may be -> There may be

Thanks, will fix.

+       /* skip statistics with mismatching stxdinherit value */
+       if (stat->inherit != rte->inh)

Should a log be added for the above case ?

Why should we log this? It's an entirely expected case - there's a
mismatch between inheritance for the relation and statistics, simply
skipping it is the right thing to do.

+                    * Determine if we'redealing with inheritance tree.

There should be a space between re and dealing.

Thanks, will fix.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#19)
Re: extended stats on partitioned tables

On Sat, Dec 11, 2021 at 9:14 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 12/12/21 05:38, Zhihong Yu wrote:

On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

Hi,

Attached is a rebased and cleaned-up version of these patches, with

more

comments, refactorings etc. Justin and Zhihong, can you take a look?

0001 - Ignore extended statistics for inheritance trees

0002 - Build inherited extended stats on partitioned tables

Those are mostly just Justin's patches, with more detailed comments

and

updated commit message. I've considered moving the rel->inh check to
statext_clauselist_selectivity, and then removing the check from
dependencies and MCV. But I decided no to do that, because someone

might

be calling those functions directly (even if that's very unlikely).

The one thing bugging me a bit is that the regression test checks

only a

GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use

per-partitions

stats.

0003 - Add stxdinherit flag to pg_statistic_ext_data

This is the patch for master, allowing to build stats for both

inherits

flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
how we deal with iterating both flags etc. I've adopted most of the
Justin's fixup patches, except that in plancat.c I've refactored how

we

load the stats to process keys/expressions just once.

It has the same issue with regression test using just a GROUP BY

query,

but if we add a test to 0001/0002, that'll fix this too.

0004 - Refactor parent ACL check

Not sure about this - I doubt saving 30 rows in an 8kB file is really
worth it. Maybe it is, but then maybe we should try cleaning up the
other ACL checks in this file too? Seems mostly orthogonal to this
thread, though.

Hi,
For patch 3, in commit message:

and there no clear winner. -> and there is no clear winner.

and it seem wasteful -> and it seems wasteful

The there may be -> There may be

Thanks, will fix.

+       /* skip statistics with mismatching stxdinherit value */
+       if (stat->inherit != rte->inh)

Should a log be added for the above case ?

Why should we log this? It's an entirely expected case - there's a
mismatch between inheritance for the relation and statistics, simply
skipping it is the right thing to do.

Hi,
I agree that skipping should be fine (to avoid too much logging).

Thanks

Show quoted text

+ * Determine if we'redealing with inheritance tree.

There should be a space between re and dealing.

Thanks, will fix.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#17)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Yu (#20)
#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Zhihong Yu (#21)
#24Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#17)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#23)
#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#24)
#27Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#17)
#28Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#26)
#29Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#27)
#30Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#26)
#31Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#29)
#32Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#30)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#31)
#34Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#33)
#35Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#34)
#36Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#35)
#37Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#36)