Support for dumping extended statistics

Started by Hari krishna Maddiletiover 3 years ago10 messageshackers
Jump to latest
#1Hari krishna Maddileti
hmaddileti@vmware.com

Hi Team,

In order to restore dumped extended statistics (stxdndistinct, stxddependencies, stxdmcv) we need to provide input functions to parse pg_distinct/pg_dependency/pg_mcv_list strings.

Today we get the ERROR "cannot accept a value of type pg_ndistinct/pg_dependencies/pg_mcv_list" when we try to do an insert of any type.

Approch tried:
- Using yacc grammar file (statistics_gram.y) to parse the input string to its internal format for the types pg_distinct and pg_dependencies
- We are just calling byteain() for serialized input text of type pg_mcv_list.

Currently the changes are working locally, I would like to push the commit changes to upstream if there any usecase for postgres. Would like to know if there any interest from postgres side.

Regards,
Hari Krishna

#2Bruce Momjian
bruce@momjian.us
In reply to: Hari krishna Maddileti (#1)
Re: Support for dumping extended statistics

On Thu, Jan 5, 2023 at 06:29:03PM +0000, Hari krishna Maddileti wrote:

Hi Team,
In order to restore dumped extended statistics (stxdndistinct,
stxddependencies, stxdmcv) we need to provide input functions to parse
pg_distinct/pg_dependency/pg_mcv_list strings.

Today we get the ERROR "cannot accept a value of type pg_ndistinct/
pg_dependencies/pg_mcv_list" when we try to do an insert of any type.

Approch tried:

- Using yacc grammar file (statistics_gram.y) to parse the input string to its
internal format for the types pg_distinct and pg_dependencies

- We are just calling byteain() for serialized input text of type pg_mcv_list.

Currently the changes are working locally, I would like to push the commit
changes to upstream if there any usecase for postgres. Would like to know if
there any interest from postgres side.

There is certainly interest in allowing the optimizer statistics to be
dumped and reloaded. This could be used by pg_restore and pg_upgrade.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#3Hari krishna Maddileti
hmaddileti@vmware.com
In reply to: Bruce Momjian (#2)
Re: Support for dumping extended statistics

Thanks Team for showing interest.

Please find the attached patch, which uses the same approach as mentioned in previous email to implement input functions to parse pg_distinct, pg_dependency and pg_mcv_list strings.

Regards,
Hari
From: Bruce Momjian <bruce@momjian.us>
Date: Saturday, 7 January 2023 at 8:10 AM
To: Hari krishna Maddileti <hmaddileti@vmware.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Support for dumping extended statistics
!! External Email

On Thu, Jan 5, 2023 at 06:29:03PM +0000, Hari krishna Maddileti wrote:

Hi Team,
In order to restore dumped extended statistics (stxdndistinct,
stxddependencies, stxdmcv) we need to provide input functions to parse
pg_distinct/pg_dependency/pg_mcv_list strings.

Today we get the ERROR "cannot accept a value of type pg_ndistinct/
pg_dependencies/pg_mcv_list" when we try to do an insert of any type.

Approch tried:

- Using yacc grammar file (statistics_gram.y) to parse the input string to its
internal format for the types pg_distinct and pg_dependencies

- We are just calling byteain() for serialized input text of type pg_mcv_list.

Currently the changes are working locally, I would like to push the commit
changes to upstream if there any usecase for postgres. Would like to know if
there any interest from postgres side.

There is certainly interest in allowing the optimizer statistics to be
dumped and reloaded. This could be used by pg_restore and pg_upgrade.

--
Bruce Momjian <bruce@momjian.us> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmomjian.us%2F&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C3eec45fa323646114b1b08daf0587937%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638086560027653219%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NGidyq8AYdqqAAjirIud%2FE2SD%2Bw4MWmdyFwIu2Bos4A%3D&amp;reserved=0
EDB https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fenterprisedb.com%2F&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C3eec45fa323646114b1b08daf0587937%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638086560027653219%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=XUv87gO4KT3W%2FJh17szMBUryZF5kB2hhkY8DD8HeAjE%3D&amp;reserved=0

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Attachments:

v13-0001-Implement-input-functions-for-extended-statistic.patchapplication/octet-stream; name=v13-0001-Implement-input-functions-for-extended-statistic.patchDownload+771-34
#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Hari krishna Maddileti (#3)
Re: Support for dumping extended statistics

On Tue, Jan 10, 2023 at 11:28:36AM +0000, Hari krishna Maddileti wrote:

Thanks Team for showing interest.

Please find the attached patch, which uses the same approach as mentioned in previous email to implement input functions to parse pg_distinct, pg_dependency and pg_mcv_list strings.

The patch is failing ; you need to make the corresponding update to
meson as you did for make.

http://cfbot.cputube.org/david-kimura.html
https://wiki.postgresql.org/wiki/Meson_for_patch_authors
https://wiki.postgresql.org/wiki/Meson

But actually, it also fails to compile with "make".

--
Justin

#5Hari krishna Maddileti
hmaddileti@vmware.com
In reply to: Justin Pryzby (#4)
Re: Support for dumping extended statistics

Hi Justin,
Thanks for the update, I have attached the updated patch with meson compatible and addressed warnings from make file too.

On 15/01/23, 2:27 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

!! External Email

On Tue, Jan 10, 2023 at 11:28:36AM +0000, Hari krishna Maddileti wrote:

Thanks Team for showing interest.

Please find the attached patch, which uses the same approach as mentioned in previous email to implement input functions to parse pg_distinct, pg_dependency and pg_mcv_list strings.

The patch is failing ; you need to make the corresponding update to
meson as you did for make.

https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2Fdavid-kimura.html&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ijYtKFzkEiruO9ZyzqEhsDakZG6G9IjJQgY3DiN4eUQ%3D&amp;reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FMeson_for_patch_authors&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=udY5fPdSMhi1wlcNiR0EHwvdiV5ozoQL8gDhNfJCcUI%3D&amp;reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FMeson&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kHDvMHWoGXyk67%2FM9Kkct%2Bl4t2554XyJCoy53Eqx1xo%3D&amp;reserved=0

But actually, it also fails to compile with "make".

--
Justin

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Attachments:

v2-0001-Implement-input-functions-for-extended-statistics.patchapplication/octet-stream; name=v2-0001-Implement-input-functions-for-extended-statistics.patchDownload+746-35
#6Hari krishna Maddileti
hmaddileti@vmware.com
In reply to: Hari krishna Maddileti (#5)
Re: Support for dumping extended statistics

+ pgsql-hackers

Hi Justin,
Thanks for the update, I have attached the updated patch with meson compatible and addressed warnings from make file too.

On 15/01/23, 2:27 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

!! External Email

On Tue, Jan 10, 2023 at 11:28:36AM +0000, Hari krishna Maddileti wrote:

Thanks Team for showing interest.

Please find the attached patch, which uses the same approach as mentioned in previous email to implement input functions to parse pg_distinct, pg_dependency and pg_mcv_list strings.

The patch is failing ; you need to make the corresponding update to
meson as you did for make.

https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2Fdavid-kimura.html&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ijYtKFzkEiruO9ZyzqEhsDakZG6G9IjJQgY3DiN4eUQ%3D&amp;reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FMeson_for_patch_authors&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=udY5fPdSMhi1wlcNiR0EHwvdiV5ozoQL8gDhNfJCcUI%3D&amp;reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FMeson&amp;data=05%7C01%7Chmaddileti%40vmware.com%7C299f368fff494a8eddc508daf671e768%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638093266355001101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kHDvMHWoGXyk67%2FM9Kkct%2Bl4t2554XyJCoy53Eqx1xo%3D&amp;reserved=0

But actually, it also fails to compile with "make".

--
Justin

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Hari krishna Maddileti (#5)
Re: Support for dumping extended statistics

On Wed, Feb 01, 2023 at 04:38:17AM +0000, Hari krishna Maddileti wrote:

Hi Justin,
Thanks for the update, I have attached the updated patch with meson compatible and addressed warnings from make file too.

Thanks - I see it compiles now under both build systems.

But there's build warnings, and it fails regression tests.

http://cfbot.cputube.org/david-kimura.html

Show quoted text

On 15/01/23, 2:27 AM, "Justin Pryzby" <pryzby@telsasoft.com> wrote:

On Tue, Jan 10, 2023 at 11:28:36AM +0000, Hari krishna Maddileti wrote:

Thanks Team for showing interest.

Please find the attached patch, which uses the same approach as mentioned in previous email to implement input functions to parse pg_distinct, pg_dependency and pg_mcv_list strings.

The patch is failing ; you need to make the corresponding update to
meson as you did for make.

But actually, it also fails to compile with "make".

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bruce Momjian (#2)
Re: Support for dumping extended statistics

On 1/7/23 03:39, Bruce Momjian wrote:

On Thu, Jan 5, 2023 at 06:29:03PM +0000, Hari krishna Maddileti wrote:

Hi Team,
In order to restore dumped extended statistics (stxdndistinct,
stxddependencies, stxdmcv) we need to provide input functions to parse
pg_distinct/pg_dependency/pg_mcv_list strings.

Today we get the ERROR "cannot accept a value of type pg_ndistinct/
pg_dependencies/pg_mcv_list" when we try to do an insert of any type.

Approch tried:

- Using yacc grammar file (statistics_gram.y) to parse the input string to its
internal format for the types pg_distinct and pg_dependencies

- We are just calling byteain() for serialized input text of type pg_mcv_list.

Currently the changes are working locally, I would like to push the commit
changes to upstream if there any usecase for postgres. Would like to know if
there any interest from postgres side.

There is certainly interest in allowing the optimizer statistics to be
dumped and reloaded. This could be used by pg_restore and pg_upgrade.

Indeed, although I think it'd be better to deal with regular statistics
(which is what 99% of systems use). Furthermore, we should probably
think about differences between major versions - until now we could
change on-disk format of the statistics, because we have reset them.
It'd be silly to do dump on version X, and then fail to restore it on
(X+1) just because the statistics changed a bit.

So we need to be able to determine is the statistics has the correct
format/version, or what. And we need to do that for pg_upgrade.

At the very least we need an option to skip restoring statistics, or
something like that.

regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#8)
Re: Support for dumping extended statistics

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 1/7/23 03:39, Bruce Momjian wrote:

There is certainly interest in allowing the optimizer statistics to be
dumped and reloaded. This could be used by pg_restore and pg_upgrade.

Indeed, although I think it'd be better to deal with regular statistics
(which is what 99% of systems use). Furthermore, we should probably
think about differences between major versions - until now we could
change on-disk format of the statistics, because we have reset them.

Yeah, it's extremely odd to be proposing dump/reload for extended
stats when we don't yet have it for plain stats. And yes, the main
stumbling block is that you need to have a plan for stats changing
across versions, or even just environmental issues. For example,
what if the target DB doesn't use the same collation as the source?
That would affect string sorting and therefore at least partially
invalidate histograms for text columns.

I actually did some work on this, probably close to ten years ago
now, and came up with some hacks that didn't pass community review.
It'd be a good idea to dig up those old discussions if you want to
re-open the topic.

regards, tom lane

#10Andres Freund
andres@anarazel.de
In reply to: Hari krishna Maddileti (#5)
Re: Support for dumping extended statistics

Hi,

On 2023-02-01 04:38:17 +0000, Hari krishna Maddileti wrote:

Thanks for the update, I have attached the updated patch with meson compatible and addressed warnings from make file too.

This patch consistently crashes in CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F4114

Example crash:
https://api.cirrus-ci.com/v1/task/4910781754507264/logs/cores.log

Greetings,

Andres Freund