Statistics import and export: difference in statistics of materialized view dumped

Started by Ashutosh Bapatabout 1 year ago10 messageshackers
Jump to latest
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com

Hi Jeff, Corey,
After fixing the statistics difference in dumps of tables with
indexes, I now see difference in statistics of materialized view dump
in the test I am developing at [1]/messages/by-id/CAExHW5sBbMki6Xs4XxFQQF3C4Wx3wxkLAcySrtuW3vrnOxXDNQ@mail.gmail.com -- Best Wishes, Ashutosh Bapat (see the latest patches there).

I see following difference in the dump from the original regression
database and the dump taken from the database where the dump is
restored

@@ -441198,8 +441198,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_bb'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -441218,8 +441218,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tm'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -441238,8 +441238,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tvmm'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -448468,9 +448468,9 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.tableam_tblmv_heap2'::regclass,
- 'relpages', '1'::integer,
- 'reltuples', '1'::real,
- 'relallvisible', '1'::integer
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
+ 'relallvisible', '0'::integer
);

These are materialised views created in the test matview.sql and create_am.sql.

When I tried to reproduce the issue outside the test using the
attached scripts. The SQL is just copied from matview.sql. But both
the dumps (from original and restored databases) do not show any
difference. But if I run "make installcheck", take dump of regression
database, restore it, take dump of restored database, I am able to see
the following difference
*** 458089,458097 ****
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.tableam_tblmv_heap2'::regclass,
! 'relpages', '1'::integer,
! 'reltuples', '1'::real,
! 'relallvisible', '1'::integer
);

--- 458089,458097 ----
  SELECT * FROM pg_catalog.pg_restore_relation_stats(
        'version', '180000'::integer,
        'relation', 'public.tableam_tblmv_heap2'::regclass,
!       'relpages', '0'::integer,
!       'reltuples', '-1'::real,
!       'relallvisible', '0'::integer
  );

This seems to be a real problem since the statistics is going back
i.e. useful statistics is being reset.

[1]: /messages/by-id/CAExHW5sBbMki6Xs4XxFQQF3C4Wx3wxkLAcySrtuW3vrnOxXDNQ@mail.gmail.com -- Best Wishes, Ashutosh Bapat
--
Best Wishes,
Ashutosh Bapat

Attachments:

mtv_test.sqlapplication/sql; name=mtv_test.sqlDownload
mtv_test.shapplication/x-shellscript; name=mtv_test.shDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#1)
Re: Statistics import and export: difference in statistics of materialized view dumped

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

After fixing the statistics difference in dumps of tables with
indexes, I now see difference in statistics of materialized view dump
in the test I am developing at [1] (see the latest patches there).

Are you doing the restore in parallel by any chance? I had a todo
item to revisit the dependencies that pg_dump is creating for stats
items, because they looked wrong to me, ie inadequate to guarantee
correct restore order.

regards, tom lane

#3Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#2)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:

After fixing the statistics difference in dumps of tables with
indexes, I now see difference in statistics of materialized view
dump
in the test I am developing at [1] (see the latest patches there).

Are you doing the restore in parallel by any chance?  I had a todo
item to revisit the dependencies that pg_dump is creating for stats
items, because they looked wrong to me, ie inadequate to guarantee
correct restore order.

It's creating a dependency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).

I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.

Regards,
Jeff Davis

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#3)
Re: Statistics import and export: difference in statistics of materialized view dumped

Jeff Davis <pgsql@j-davis.com> writes:

On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:

Are you doing the restore in parallel by any chance? I had a todo
item to revisit the dependencies that pg_dump is creating for stats
items, because they looked wrong to me, ie inadequate to guarantee
correct restore order.

It's creating a dependency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).
I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.

Right, that was what I was thinking, but hadn't had time to look in
detail. The postDataBound dependency isn't real helpful here, we
could lose that if we had the data dependency.

regards, tom lane

#5Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#4)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote:

Right, that was what I was thinking, but hadn't had time to look in
detail.  The postDataBound dependency isn't real helpful here, we
could lose that if we had the data dependency.

Attached a patch.

It's a bit messier than I expected, so I'm open to other suggestions.
The reason is because materialized view data is also pushed to
RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
(otherwise the dependency is just ignored).

Regards,
Jeff Davis

Attachments:

v1-0001-Make-MV-statistics-depend-on-MV-data.patchtext/x-patch; charset=UTF-8; name=v1-0001-Make-MV-statistics-depend-on-MV-data.patchDownload+95-61
#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Jeff Davis (#5)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Wed, Mar 12, 2025 at 4:08 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote:

Right, that was what I was thinking, but hadn't had time to look in
detail. The postDataBound dependency isn't real helpful here, we
could lose that if we had the data dependency.

Attached a patch.

It's a bit messier than I expected, so I'm open to other suggestions.
The reason is because materialized view data is also pushed to
RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
(otherwise the dependency is just ignored).

I ran my test with this patch (we have to remove 0003 patch in my test
which uses --no-statistics option). It failed with following
differences
@@ -452068,8 +452068,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_aa'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452097,8 +452097,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tm_type'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452111,8 +452111,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tvmm_expr'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--=== stderr ===
=== EOF ===

The previous differences have disappeared but new differences have appeared.

--
Best Wishes,
Ashutosh Bapat

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#6)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Wed, Mar 12, 2025 at 4:29 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

I ran my test with this patch (we have to remove 0003 patch in my test
which uses --no-statistics option). It failed with following
differences
@@ -452068,8 +452068,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_aa'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452097,8 +452097,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tm_type'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452111,8 +452111,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.mvtest_tvmm_expr'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--=== stderr ===
=== EOF ===

The previous differences have disappeared but new differences have appeared.

Pulled the latest sources but the test is still failing with the same
differences.

--
Best Wishes,
Ashutosh Bapat

#8Jeff Davis
pgsql@j-davis.com
In reply to: Ashutosh Bapat (#7)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Thu, 2025-03-27 at 17:07 +0530, Ashutosh Bapat wrote:

Pulled the latest sources but the test is still failing with the same
differences.

The attached set of patches (your 0001 and 0002, combined with my patch
v2j-0003) applied on master (058b5152f0) are passing the pg_upgrade
test suite for me.

Are you saying that the tests don't work for you even when v2j-0003 is
applied? Or are you saying that your tests are failing on master, and
that v2j-0002 should be committed?

Regards,
Jeff Davis

Attachments:

v2j-0001-Test-pg_dump-restore-of-regression-objects.patchtext/x-patch; charset=UTF-8; name=v2j-0001-Test-pg_dump-restore-of-regression-objects.patchDownload+324-3
v2j-0002-Use-only-one-format-and-make-the-test-run-defaul.patchtext/x-patch; charset=UTF-8; name=v2j-0002-Use-only-one-format-and-make-the-test-run-defaul.patchDownload+25-64
v2j-0003-Make-MV-statistics-depend-on-MV-data.patchtext/x-patch; charset=UTF-8; name=v2j-0003-Make-MV-statistics-depend-on-MV-data.patchDownload+95-61
#9Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Jeff Davis (#8)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Fri, Mar 28, 2025 at 10:41 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2025-03-27 at 17:07 +0530, Ashutosh Bapat wrote:

Pulled the latest sources but the test is still failing with the same
differences.

The attached set of patches (your 0001 and 0002, combined with my patch
v2j-0003) applied on master (058b5152f0) are passing the pg_upgrade
test suite for me.

Are you saying that the tests don't work for you even when v2j-0003 is
applied? Or are you saying that your tests are failing on master, and
that v2j-0002 should be committed?

When I applied v1 it didn't pass.

But applying v2j-0003, the test passes.

I don't think I understand the patch fully. But I have a comment.

+ * However, the section may be updated later for materialized views.
+ * Matview stats depend on the matview data, because REFRESH
+ * MATERIALIZED VIEW replaces the storage and resets the stats, and
+ * the matview data is in SECTION_POST_DATA. Also, the materialized
+ * SECTION_POST_DATA to resolve some kinds of dependency problems (see
+ * repairMatViewBoundaryMultiLoop()).
+ */

It feels like we can simplify this as: REFRESH MATERIALIZED VIEW
replaces storage and resets the stats hence the Matview stats should
be printed after printing REFRESH MATERIALIZED VIEW command in
SECTION_POST_DATA. Also the materialized view ... . Hence the section
may be updated later for materialized views.

--
Best Wishes,
Ashutosh Bapat

#10Jeff Davis
pgsql@j-davis.com
In reply to: Ashutosh Bapat (#9)
Re: Statistics import and export: difference in statistics of materialized view dumped

On Fri, 2025-03-28 at 14:53 +0530, Ashutosh Bapat wrote:

When I applied v1 it didn't pass.

I applied v1 on top of master as of March 15 (771ba90298), and then
took your two changes adding the tests, and it passed.

Version v2j is just rebased forward, which involved a trivial merge
conflict.

But applying v2j-0003, the test passes.

Great.

It feels like we can simplify this as: REFRESH MATERIALIZED VIEW
replaces storage and resets the stats hence the Matview stats should
be printed after printing REFRESH MATERIALIZED VIEW command in
SECTION_POST_DATA. Also the materialized view ... . Hence the section
may be updated later for materialized views.

Done, with a few modifications. Hopefully it's simpler.

Tom suggested that there may be other dependency problems lurking, but
I believe the patch is an improvement, so I committed it now to unblock
your work.

Regards,
Jeff Davis