A few new options for vacuumdb
Hi hackers,
I've attached a few patches to add some options to vacuumdb that might
be useful. Specifically, these patches add the --skip-locked,
--min-xid-age, --min-mxid-age, and --min-relation-size options.
If an option is specified for a server version that is not supported,
the option is silently ignored. For example, SKIP_LOCKED was only
added to VACUUM and ANALYZE for v12. Alternatively, I think we could
fail in vacuum_one_database() if an unsupported option is specified.
Some of these options will work on all currently supported versions,
so I am curious what others think about skipping some of these version
checks altogether.
In this set of patches, I've disallowed using --min-xid-age,
--min-mxid-age, and --min-relation-size in conjunction with the
--table option. IMO this combination of options is prone to
confusion. For example:
vacuumdb mydb --table mytable --min-relation-size 1024
It does not seem clear whether the user wants us to process mytable
only if it is at least 1 GB, or if we should process mytable in
addition to any other relations over 1 GB. Either way, I think trying
to support these combinations of options adds more complexity than it
is worth.
0001 is a minor fix that is somewhat separate from these new options,
although the new options will make the edge case it aims to fix much
easier to reach. When the catalogs are queried in parallel mode to
get the list of tables to process, we currently assume that at least
one table will be returned. If no tables are found, the tables
variable will stay as NULL, which leads to database-wide VACUUM or
ANALYZE commands. Since there are currently no user-configurable
options available for this catalog query, this case is likely
exceptionally rare. However, with the new options, it is much easier
to inadvertently filter out all relations.
I will be adding this work to the next commitfest.
Nathan
Attachments:
v1-0002-Add-skip-locked-option-to-vacuumdb.patchapplication/octet-stream; name=v1-0002-Add-skip-locked-option-to-vacuumdb.patchDownload+38-2
v1-0003-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchapplication/octet-stream; name=v1-0003-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchDownload+130-16
v1-0004-Add-min-relation-size-option-to-vacuumdb.patchapplication/octet-stream; name=v1-0004-Add-min-relation-size-option-to-vacuumdb.patchDownload+59-5
v1-0001-Do-not-process-any-relations-if-the-catalog-query.patchapplication/octet-stream; name=v1-0001-Do-not-process-any-relations-if-the-catalog-query.patchDownload+10-5
On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote:
If an option is specified for a server version that is not supported,
the option is silently ignored. For example, SKIP_LOCKED was only
added to VACUUM and ANALYZE for v12. Alternatively, I think we could
fail in vacuum_one_database() if an unsupported option is specified.
Some of these options will work on all currently supported versions,
so I am curious what others think about skipping some of these version
checks altogether.
prepare_vacuum_command() already handles that by ignoring silently
unsupported options (see the case of SKIP_LOCKED). So why not doing the
same?
It does not seem clear whether the user wants us to process mytable
only if it is at least 1 GB, or if we should process mytable in
addition to any other relations over 1 GB. Either way, I think trying
to support these combinations of options adds more complexity than it
is worth.
It seems to me that a combination of both options means that the listed
table should be processed only if its minimum size is 1GB. If multiple
tables are specified with --table, then only those reaching 1GB would be
processed. So this restriction can go away. The same applies for the
proposed --min-xid-age and --min-mxid-age.
+ <para>
+ Only execute the vacuum or analyze commands on tables with a multixact
+ ID age of at least <replaceable
class="parameter">mxid_age</replaceable>.
+ </para>
Adding a link to explain the multixact business may be helpful, say
vacuum-for-multixact-wraparound. Same comment for XID.
0001 is a minor fix that is somewhat separate from these new options,
although the new options will make the edge case it aims to fix much
easier to reach. When the catalogs are queried in parallel mode to
get the list of tables to process, we currently assume that at least
one table will be returned. If no tables are found, the tables
variable will stay as NULL, which leads to database-wide VACUUM or
ANALYZE commands. Since there are currently no user-configurable
options available for this catalog query, this case is likely
exceptionally rare. However, with the new options, it is much easier
to inadvertently filter out all relations.
Agreed. No need to visibly bother about that in back-branches.
There is an argument about also adding DISABLE_PAGE_SKIPPING.
--
Michael
Hi Michael,
Thanks for taking a look.
On 12/19/18, 8:05 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote:
If an option is specified for a server version that is not supported,
the option is silently ignored. For example, SKIP_LOCKED was only
added to VACUUM and ANALYZE for v12. Alternatively, I think we could
fail in vacuum_one_database() if an unsupported option is specified.
Some of these options will work on all currently supported versions,
so I am curious what others think about skipping some of these version
checks altogether.prepare_vacuum_command() already handles that by ignoring silently
unsupported options (see the case of SKIP_LOCKED). So why not doing the
same?
The --skip-locked option in vacuumdb is part of 0002, so I don't think
there's much precedent here. We do currently fall back to the
unparenthesized syntax for VACUUM for versions before 9.0, so there is
some version handling already. What do you think about removing
version checks for unsupported major versions? If we went that route,
these new patches could add version checks only for options that were
added in a supported major version (e.g. mxid_age() was added in 9.5).
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.
It does not seem clear whether the user wants us to process mytable
only if it is at least 1 GB, or if we should process mytable in
addition to any other relations over 1 GB. Either way, I think trying
to support these combinations of options adds more complexity than it
is worth.It seems to me that a combination of both options means that the listed
table should be processed only if its minimum size is 1GB. If multiple
tables are specified with --table, then only those reaching 1GB would be
processed. So this restriction can go away. The same applies for the
proposed --min-xid-age and --min-mxid-age.
Okay. This should be pretty easy to do using individual catalog
lookups before we call prepare_vacuum_command(). Alternatively, I
could add a clause for each specified table in the existing query in
vacuum_one_database(). At that point, it may be cleaner to just use
that catalog query in all cases instead of leaving paths where tables
can remain NULL.
+ <para> + Only execute the vacuum or analyze commands on tables with a multixact + ID age of at least <replaceable class="parameter">mxid_age</replaceable>. + </para> Adding a link to explain the multixact business may be helpful, say vacuum-for-multixact-wraparound. Same comment for XID.
Good idea.
There is an argument about also adding DISABLE_PAGE_SKIPPING.
I'll add this in the next set of patches. I need to add the
parenthesized syntax and --skip-locked for ANALYZE in
prepare_vacuum_command(), too.
Nathan
On Thu, Dec 20, 2018 at 04:48:11PM +0000, Bossart, Nathan wrote:
The --skip-locked option in vacuumdb is part of 0002, so I don't think
there's much precedent here.
It looks like I was not looking at the master branch here ;)
We do currently fall back to the
unparenthesized syntax for VACUUM for versions before 9.0, so there is
some version handling already. What do you think about removing
version checks for unsupported major versions?
I am not exactly sure down to which version this needs to be supported
and what's the policy about that (if others have an opinion to share,
please feel free) but surely it does not hurt to keep those code paths
either from a maintenance point of view.
If we went that route, these new patches could add version checks only
for options that were added in a supported major version (e.g.
mxid_age() was added in 9.5). Either way, we'll still have to decide
whether to fail or to silently skip the option if you do something
like specify --min-mxid-age for a 9.4 server.
There are downsides and upsides for each approach. Reporting a failure
makes it clear that an option is not available with a clear error
message, however it complicates a bit the error handling for parallel
runs. And vacuum_one_database should be the code path doing as that's
where all the connections are taken even when all databases are
processed.
Ignoring the option, as your patch set does, has the disadvantage to
potentially confuse users, say we may see complains like "why is VACUUM
locking even if I specified --skip-locked?". Still this keeps the code
minimalistic and simple.
Just passing down the options and seeing a failure in the query
generated is also a confusing experience I guess for not-so-advanced
users even if it may simplify the error handling.
Issuing a proper error feels more natural to me I think, as users can
react on that properly. Opinions from others are welcome.
--
Michael
On Wed, Dec 19, 2018 at 9:05 PM Michael Paquier <michael@paquier.xyz> wrote:
It does not seem clear whether the user wants us to process mytable
only if it is at least 1 GB, or if we should process mytable in
addition to any other relations over 1 GB. Either way, I think trying
to support these combinations of options adds more complexity than it
is worth.It seems to me that a combination of both options means that the listed
table should be processed only if its minimum size is 1GB. If multiple
tables are specified with --table, then only those reaching 1GB would be
processed.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.
+1 for fail.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.+1 for fail.
Sounds good. I'll keep all the version checks and will fail for
unsupported options in the next patch set.
Nathan
On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.+1 for fail.
Sounds good. I'll keep all the version checks and will fail for
unsupported options in the next patch set.
Here's an updated set of patches with the following changes:
- 0002 adds the parenthesized syntax for ANALYZE.
- 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
- 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
- 0004 alters vacuumdb to always use the catalog query to discover
the list of tables to process.
- 0003, 0005, and 0006 now fail in vacuum_one_database() if a
specified option is not available on the server version.
- If tables are listed along with --min-xid-age, --min-mxid-age, or
--min-relation-size, each table is processed only if it satisfies
the provided options.
0004 introduces a slight change to existing behavior. Currently, if
you specify a missing table, vacuumdb will process each table until
it reaches the nonexistent one, at which point it will fail. After
0004 is applied, vacuumdb will fail during the catalog query, and no
tables will be processed. I considered avoiding this change in
behavior by doing an additional catalog lookup for each specified
table to see whether it satisfies --min-xid-age, etc. However, this
introduced quite a bit more complexity and increased the number of
catalog queries needed.
Nathan
Attachments:
v2-0006-Add-min-relation-size-option-to-vacuumdb.patchapplication/octet-stream; name=v2-0006-Add-min-relation-size-option-to-vacuumdb.patchDownload+45-2
v2-0001-Do-not-process-any-relations-if-the-catalog-query.patchapplication/octet-stream; name=v2-0001-Do-not-process-any-relations-if-the-catalog-query.patchDownload+10-5
v2-0002-Support-parenthesized-syntax-for-ANALYZE-in-vacuu.patchapplication/octet-stream; name=v2-0002-Support-parenthesized-syntax-for-ANALYZE-in-vacuu.patchDownload+37-42
v2-0003-Add-disable-page-skipping-and-skip-locked-to-vacu.patchapplication/octet-stream; name=v2-0003-Add-disable-page-skipping-and-skip-locked-to-vacu.patchDownload+84-2
v2-0004-Always-use-a-catalog-query-to-discover-tables-to-.patchapplication/octet-stream; name=v2-0004-Always-use-a-catalog-query-to-discover-tables-to-.patchDownload+90-72
v2-0005-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchapplication/octet-stream; name=v2-0005-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchDownload+92-2
On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote:
0004 introduces a slight change to existing behavior. Currently, if
you specify a missing table, vacuumdb will process each table until
it reaches the nonexistent one, at which point it will fail. After
0004 is applied, vacuumdb will fail during the catalog query, and no
tables will be processed.
I have not looked at the patch set in details, but that would make
vacuumdb more consistent with the way VACUUM works with multiple
relations, which sounds like a good thing.
--
Michael
On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote:
Here's an updated set of patches with the following changes:
- 0002 adds the parenthesized syntax for ANALYZE.
- 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
- 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
- 0004 alters vacuumdb to always use the catalog query to discover
the list of tables to process.
- 0003, 0005, and 0006 now fail in vacuum_one_database() if a
specified option is not available on the server version.
- If tables are listed along with --min-xid-age, --min-mxid-age, or
--min-relation-size, each table is processed only if it satisfies
the provided options.
I have been looking at the patch set, and 0001 can actually happen
only once 0005 is applied because this modifies the query doing on
HEAD a full scan of pg_class which would include at least catalog
tables so it can never be empty. For this reason it seems to me that
0001 and 0004 should be merged together as common refactoring, making
sure on the way that all relations exist before processing anything.
As 0005 and 0006 change similar portions of the code, we could also
merge these together in a second patch introducing the new options.
Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
would likely merge things when they make sense together to reduce
diff chunks.
0002 removes some carefully-written query generation, so as it is
never possible to generate something like ANALYZE FULL. HEAD splits
ANALYZE and VACUUM clearly, but 0002 merges them together so as
careless coding in vacuumdb.c makes it easier to generate command
strings which would fail in the backend relying on the option checks
to make sure that for example combining --full and --analyze-only
never happens. Introducing 0002 is mainly here for 0003, so let's
merge them together.
0004 introduces a slight change to existing behavior. Currently, if
you specify a missing table, vacuumdb will process each table until
it reaches the nonexistent one, at which point it will fail. After
0004 is applied, vacuumdb will fail during the catalog query, and no
tables will be processed. I considered avoiding this change in
behavior by doing an additional catalog lookup for each specified
table to see whether it satisfies --min-xid-age, etc. However, this
introduced quite a bit more complexity and increased the number of
catalog queries needed.
Simplicity is always welcome, knowing that tables are missing before
doing any operations is more consistent with plain VACUUM/ANALYZE.
From patch 0004:
+ executeCommand(conn, "RESET search_path;", progname, echo);
+ res = executeQuery(conn, catalog_query.data, progname, echo);
+ termPQExpBuffer(&catalog_query);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL,
+ progname, echo));
We should really avoid that. There are other things like casts which
tend to be forgotten, and if the catalog lookup query gets more
complicated, I feel that this would again be forgotten, reintroducing
the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.
I have put my hands into what you sent, and worked on putting
0002/0003 first into shape, finishing with the attached. I have
simplified the query construction a bit, making it IMO easier to read
and to add new options, with comments documenting what's supported
across versions. I have also added a regression test for
--analyze-only --skip-locked. Then I have done some edit of the docs.
What do you think for this portion?
--
Michael
Attachments:
vacuumdb-skip-disable.patchtext/x-diff; charset=us-asciiDownload+124-7
On Sat, Jan 5, 2019 at 8:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:
On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.+1 for fail.
Sounds good. I'll keep all the version checks and will fail for
unsupported options in the next patch set.Here's an updated set of patches with the following changes:
- 0002 adds the parenthesized syntax for ANALYZE.
- 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
- 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
- 0004 alters vacuumdb to always use the catalog query to discover
the list of tables to process.
- 0003, 0005, and 0006 now fail in vacuum_one_database() if a
specified option is not available on the server version.
- If tables are listed along with --min-xid-age, --min-mxid-age, or
--min-relation-size, each table is processed only if it satisfies
the provided options.0004 introduces a slight change to existing behavior. Currently, if
you specify a missing table, vacuumdb will process each table until
it reaches the nonexistent one, at which point it will fail. After
0004 is applied, vacuumdb will fail during the catalog query, and no
tables will be processed. I considered avoiding this change in
behavior by doing an additional catalog lookup for each specified
table to see whether it satisfies --min-xid-age, etc. However, this
introduced quite a bit more complexity and increased the number of
catalog queries needed.Nathan
0002 and 0003 are merged and posted by Michael-san and it looks good
to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
is a few review comments.
-----
Even if all tables are filtered by --min-relation-size, min-mxid-age
or min-xid-age the following message appeared.
$ vacuumdb --verbose --min-relation-size 1000000 postgres
vacuumdb: vacuuming database "postgres"
Since no tables are processed in this case isn't is better not to
output this message by moving the message after checking if ntup == 0?
-----
You use pg_total_relation_size() to check the relation size but it
might be better to use pg_relation_size() instead. The
pg_total_relation_size() includes the all index size but I think it's
common based on my experience that we check only the table size
(including toast table) to do vacuum it or not.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
0002 and 0003 are merged and posted by Michael-san and it looks good
to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
is a few review comments.
I have done another round on 0002/0003 (PQfinish was lacking after
error-ing on incompatible options) and committed the thing.
Even if all tables are filtered by --min-relation-size, min-mxid-age
or min-xid-age the following message appeared.$ vacuumdb --verbose --min-relation-size 1000000 postgres
vacuumdb: vacuuming database "postgres"Since no tables are processed in this case isn't is better not to
output this message by moving the message after checking if ntup ==
0?
Hmm. My take on this one is that we attempt to vacuum relations on
this database, but this may finish by doing nothing. Printing this
message is still important to let the user know that an attempt was
done so I would keep the message.
You use pg_total_relation_size() to check the relation size but it
might be better to use pg_relation_size() instead. The
pg_total_relation_size() includes the all index size but I think it's
common based on my experience that we check only the table size
(including toast table) to do vacuum it or not.
Yes, I am also not completely sure if the way things are defined in
the patch are completely what we are looking for. Still, I have seen
as well people complain about the total amount of space a table is
physically taking on disk, including its indexes. So from the user
experience the direction taken by the patch makes sense, as much as
does the argument you do.
--
Michael
On 1/7/19, 1:12 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
I have been looking at the patch set, and 0001 can actually happen
only once 0005 is applied because this modifies the query doing on
HEAD a full scan of pg_class which would include at least catalog
tables so it can never be empty. For this reason it seems to me that
0001 and 0004 should be merged together as common refactoring, making
sure on the way that all relations exist before processing anything.
As 0005 and 0006 change similar portions of the code, we could also
merge these together in a second patch introducing the new options.
Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
would likely merge things when they make sense together to reduce
diff chunks.
Thanks for reviewing. Sure, I can merge these together so that it's
just 2 patches.
0002 removes some carefully-written query generation, so as it is
never possible to generate something like ANALYZE FULL. HEAD splits
ANALYZE and VACUUM clearly, but 0002 merges them together so as
careless coding in vacuumdb.c makes it easier to generate command
strings which would fail in the backend relying on the option checks
to make sure that for example combining --full and --analyze-only
never happens. Introducing 0002 is mainly here for 0003, so let's
merge them together.
Makes sense. I was trying to simplify this code a bit, but I agree
with your point about relying on the option checks.
From patch 0004: + executeCommand(conn, "RESET search_path;", progname, echo); + res = executeQuery(conn, catalog_query.data, progname, echo); + termPQExpBuffer(&catalog_query); + PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, + progname, echo)); We should really avoid that. There are other things like casts which tend to be forgotten, and if the catalog lookup query gets more complicated, I feel that this would again be forgotten, reintroducing the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.
This was done in order to maintain the current behavior that
appendQualifiedRelation() gives us. I found that skipping the
search_path handling here forced us to specify the schema in the
argument for --table in most cases. At the very least, I could add a
comment here to highlight the importance of fully qualifying
everything in the catalog query. What do you think?
I have put my hands into what you sent, and worked on putting
0002/0003 first into shape, finishing with the attached. I have
simplified the query construction a bit, making it IMO easier to read
and to add new options, with comments documenting what's supported
across versions. I have also added a regression test for
--analyze-only --skip-locked. Then I have done some edit of the docs.
What do you think for this portion?
Looks good to me, except for one small thing in the documentation:
+ <para>
+ Disable all page-skipping behavior during processing based on
+ the visibility map, similarly to the option
+ <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>.
+ </para>
I think the "similarly to the option" part is slightly misleading.
It's not just similar, it _is_ using that option in the generated
commands. Perhaps we could point to the VACUUM documentation for more
information about this one.
On 1/7/19, 8:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
0002 and 0003 are merged and posted by Michael-san and it looks good
to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
is a few review comments.I have done another round on 0002/0003 (PQfinish was lacking after
error-ing on incompatible options) and committed the thing.
Thanks!
Even if all tables are filtered by --min-relation-size, min-mxid-age
or min-xid-age the following message appeared.$ vacuumdb --verbose --min-relation-size 1000000 postgres
vacuumdb: vacuuming database "postgres"Since no tables are processed in this case isn't is better not to
output this message by moving the message after checking if ntup ==
0?Hmm. My take on this one is that we attempt to vacuum relations on
this database, but this may finish by doing nothing. Printing this
message is still important to let the user know that an attempt was
done so I would keep the message.
+1 for keeping the message.
You use pg_total_relation_size() to check the relation size but it
might be better to use pg_relation_size() instead. The
pg_total_relation_size() includes the all index size but I think it's
common based on my experience that we check only the table size
(including toast table) to do vacuum it or not.Yes, I am also not completely sure if the way things are defined in
the patch are completely what we are looking for. Still, I have seen
as well people complain about the total amount of space a table is
physically taking on disk, including its indexes. So from the user
experience the direction taken by the patch makes sense, as much as
does the argument you do.
Good point. I think allowing multiple different relation size options
here would be confusing, too (e.g. --min-relation-size versus
--min-total-relation-size). IMO pg_total_relation_size() is the way
to go here, as we'll most likely need to process the indexes and TOAST
tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for
VACUUM, we'd then want to use pg_table_size() when --min-relation-size
and --disable-index-cleanup are used together in vacuumdb. Thoughts?
I've realized that I forgot to add links to the XID/MXID wraparound
documentation like you suggested a while back. I'll make sure that
gets included in the next patch set, too.
Nathan
On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote:
This was done in order to maintain the current behavior that
appendQualifiedRelation() gives us. I found that skipping the
search_path handling here forced us to specify the schema in the
argument for --table in most cases. At the very least, I could add a
comment here to highlight the importance of fully qualifying
everything in the catalog query. What do you think?
A comment sounds like a good thing. And we really shouldn't hijack
search_path even for one query...
Looks good to me, except for one small thing in the documentation:
+ <para> + Disable all page-skipping behavior during processing based on + the visibility map, similarly to the option + <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>. + </para>I think the "similarly to the option" part is slightly misleading.
It's not just similar, it _is_ using that option in the generated
commands. Perhaps we could point to the VACUUM documentation for more
information about this one.
Sure. If you have any suggestions, please feel free. Adding a link
to VACUUM documentation sounds good to me, as long as we avoid
duplicating the description related to DISABLE_PAGE_SKIPPING on the
VACUUM page.
Good point. I think allowing multiple different relation size options
here would be confusing, too (e.g. --min-relation-size versus
--min-total-relation-size). IMO pg_total_relation_size() is the way
to go here, as we'll most likely need to process the indexes and TOAST
tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for
VACUUM, we'd then want to use pg_table_size() when --min-relation-size
and --disable-index-cleanup are used together in vacuumdb.
Thoughts?
Yes, using pg_total_relation_size() looks like the best option to me
here as well, still this does not make me 100% comfortable from the
user perspective.
I've realized that I forgot to add links to the XID/MXID wraparound
documentation like you suggested a while back. I'll make sure that
gets included in the next patch set, too.
Nice, thanks.
--
Michael
On Wed, Jan 9, 2019 at 10:06 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote:
This was done in order to maintain the current behavior that
appendQualifiedRelation() gives us. I found that skipping the
search_path handling here forced us to specify the schema in the
argument for --table in most cases. At the very least, I could add a
comment here to highlight the importance of fully qualifying
everything in the catalog query. What do you think?A comment sounds like a good thing. And we really shouldn't hijack
search_path even for one query...Looks good to me, except for one small thing in the documentation:
+ <para> + Disable all page-skipping behavior during processing based on + the visibility map, similarly to the option + <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>. + </para>I think the "similarly to the option" part is slightly misleading.
It's not just similar, it _is_ using that option in the generated
commands. Perhaps we could point to the VACUUM documentation for more
information about this one.Sure. If you have any suggestions, please feel free. Adding a link
to VACUUM documentation sounds good to me, as long as we avoid
duplicating the description related to DISABLE_PAGE_SKIPPING on the
VACUUM page.Good point. I think allowing multiple different relation size options
here would be confusing, too (e.g. --min-relation-size versus
--min-total-relation-size). IMO pg_total_relation_size() is the way
to go here, as we'll most likely need to process the indexes and TOAST
tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for
VACUUM, we'd then want to use pg_table_size() when --min-relation-size
and --disable-index-cleanup are used together in vacuumdb.
Thoughts?Yes, using pg_total_relation_size() looks like the best option to me
here as well, still this does not make me 100% comfortable from the
user perspective.
Agreed.
Since pg_(total)_realtion_size() returns 0 for parent table the
specifying the parent table to vacuumdb with --min-relation-size
always does nothing. Maybe we will need to deal with this case when a
function returning whole partitoned table size is introduced.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
Since pg_(total)_relation_size() returns 0 for parent table the
specifying the parent table to vacuumdb with --min-relation-size
always does nothing. Maybe we will need to deal with this case when a
function returning whole partitoned table size is introduced.
Good point. I am not sure if we want to go down to having a size
function dedicated to partitions especially as this would just now be
a wrapper around pg_partition_tree(), but the size argument with
partitioned tables is something to think about. If we cannot sort out
this part cleanly, perhaps we could just focus on the age-ing
parameters and the other ones first? It seems to me that what is
proposed on this thread has value, so we could shave things and keep
the essential, and focus on what we are sure about for simplicity.
--
Michael
On 1/8/19, 10:34 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
Since pg_(total)_relation_size() returns 0 for parent table the
specifying the parent table to vacuumdb with --min-relation-size
always does nothing. Maybe we will need to deal with this case when a
function returning whole partitoned table size is introduced.Good point. I am not sure if we want to go down to having a size
function dedicated to partitions especially as this would just now be
a wrapper around pg_partition_tree(), but the size argument with
partitioned tables is something to think about. If we cannot sort out
this part cleanly, perhaps we could just focus on the age-ing
parameters and the other ones first? It seems to me that what is
proposed on this thread has value, so we could shave things and keep
the essential, and focus on what we are sure about for simplicity.
Sounds good. I'll leave out --min-relation-size for now.
Nathan
On Wed, Jan 9, 2019 at 1:33 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
Since pg_(total)_relation_size() returns 0 for parent table the
specifying the parent table to vacuumdb with --min-relation-size
always does nothing. Maybe we will need to deal with this case when a
function returning whole partitoned table size is introduced.Good point. I am not sure if we want to go down to having a size
function dedicated to partitions especially as this would just now be
a wrapper around pg_partition_tree(), but the size argument with
partitioned tables is something to think about. If we cannot sort out
this part cleanly, perhaps we could just focus on the age-ing
parameters and the other ones first? It seems to me that what is
proposed on this thread has value, so we could shave things and keep
the essential, and focus on what we are sure about for simplicity.
Agreed.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Here's a new patch set that should address the feedback in this
thread. The changes in this version include:
- 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
documentation. My suggestion is to keep it short and simple like
--full, --freeze, --skip-locked, --verbose, and --analyze. The
DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
documentation, and IMO it is reasonably obvious that such vacuumdb
options correspond to the VACUUM options.
- v3-0002 is essentially v2-0001 and v2-0004 combined. I've also
added a comment explaining the importance of fully qualifying the
catalog query used to discover tables to process.
- 0003 includes additional documentation changes explaining the main
uses of --min-xid-age and --min-mxid-age and linking to the
existing wraparound documentation.
As discussed, I've left out the patch for --min-relation-size for now.
Nathan
Attachments:
v3-0001-Adjust-documentation-for-vacuumdb-disable-page-sk.patchapplication/octet-stream; name=v3-0001-Adjust-documentation-for-vacuumdb-disable-page-sk.patchDownload+1-4
v3-0002-Always-use-a-catalog-query-to-discover-tables-to-.patchapplication/octet-stream; name=v3-0002-Always-use-a-catalog-query-to-discover-tables-to-.patchDownload+99-69
v3-0003-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchapplication/octet-stream; name=v3-0003-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patchDownload+97-2
On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote:
Here's a new patch set that should address the feedback in this
thread. The changes in this version include:- 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
documentation. My suggestion is to keep it short and simple like
--full, --freeze, --skip-locked, --verbose, and --analyze. The
DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
documentation, and IMO it is reasonably obvious that such vacuumdb
options correspond to the VACUUM options.
Okay, committed this one.
- v3-0002 is essentially v2-0001 and v2-0004 combined. I've also
added a comment explaining the importance of fully qualifying the
catalog query used to discover tables to process.
Regarding the search_path business, there is actually similar business
in expand_table_name_patterns() for pg_dump if you look close by, so
as users calling --table don't have to specify the schema, and just
stick with patterns.
+ /*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path
+ * (which could be unsafe), it is important to fully qualify everything.
+ */
It is not only important, but also absolutely mandatory, so let's make
the comment insisting harder on that point. From this point of view,
the query that you are building is visibly correct.
+ appendStringLiteralConn(&catalog_query, just_table, conn);
+ appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n");
+
+ pg_free(just_table);
+
+ cell = cell->next;
+ if (cell == NULL)
+ appendPQExpBuffer(&catalog_query, " )\n");
Hm. It seems to me that you are duplicating what
processSQLNamePattern() does, so we ought to use it. And it is
possible to control the generation of the different WHERE clauses with
a single query based on the number of elements in the list. Perhaps I
am missing something? It looks unnecessary to export
split_table_columns_spec() as well.
- qr/statement: ANALYZE;/,
+ qr/statement: ANALYZE pg_catalog\./,
Or we could just use "ANALYZE \.;/", perhaps patching it first.
Perhaps my regexp here is incorrect, but I just mean to check for an
ANALYZE query which begins by "ANALYZE " and finishes with ";",
without caring about what is in-between. This would make the tests
more portable.
- 0003 includes additional documentation changes explaining the main
uses of --min-xid-age and --min-mxid-age and linking to the
existing wraparound documentation.
+$node->issues_sql_like(
+ [ 'vacuumdb', '--table=pg_class', '--min-mxid-age=123456789',
'postgres'],
+ qr/AND GREATEST\(pg_catalog.mxid_age\(c.relminmxid\),
pg_catalog.mxid_age\(t.relminmxid\)\) OPERATOR\(pg_catalog.>=\) 123456789/,
+ 'vacuumdb --table --min-mxid-age');
+$node->issues_sql_like(
+ [ 'vacuumdb', '--min-xid-age=987654321', 'postgres' ],
+ qr/AND GREATEST\(pg_catalog.age\(c.relfrozenxid\),
pg_catalog.age\(t.relfrozenxid\)\) OPERATOR\(pg_catalog.>=\) 987654321/,
+ 'vacuumdb --min-xid-age');
It may be better to use numbers which make sure that no relations are
actually fetched, so as if the surrounding tests are messed up we
never make them longer than necessary just to check the shape of the
query.
--
Michael