pg_amcheck contrib application
New thread, was "Re: new heapcheck contrib module"
On Mar 2, 2021, at 10:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 2, 2021 at 12:10 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On further reflection, I decided to implement these changes and not worry about the behavioral change.
Thanks.
I skipped this part. The initcmd argument is only handed to ParallelSlotsGetIdle(). Doing as you suggest would not really be simpler, it would just move that argument to ParallelSlotsSetup(). But I don't feel strongly about it, so I can move this, too, if you like.
I didn't do this either, and for the same reason. It's just a parameter to ParallelSlotsGetIdle(), so nothing is really gained by moving it to ParallelSlotsSetup().
OK. I thought it was more natural to pass a bunch of arguments at
setup time rather than passing a bunch of arguments at get-idle time,
but I don't feel strongly enough about it to insist, and somebody else
can always change it later if they decide I had the right idea.
When you originally proposed the idea, I thought that it would work out as a simpler interface to have it your way, but in terms of the interface it came out about the same. Internally it is still simpler to do it your way, so since you seem to still like your way better, this next version has it that way.
Rather than the slots user tweak the slot's ConnParams, ParallelSlotsGetIdle() takes a dbname argument, and uses it as ConnParams->override_dbname.
OK, but you forgot to update the comments. ParallelSlotsGetIdle()
still talks about a cparams argument that it no longer has.
Fixed.
The usual idiom for sizing a memory allocation involving
FLEXIBLE_ARRAY_MEMBER is something like offsetof(ParallelSlotArray,
slots) + numslots * sizeof(ParallelSlot). Your version uses sizeof();
don't.
Fixed.
Other than that 0001 looks to me to be in pretty good shape now.
And your other review email, also moved to this new thread....
On Mar 2, 2021, at 12:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 2, 2021 at 1:24 PM Robert Haas <robertmhaas@gmail.com> wrote:
Other than that 0001 looks to me to be in pretty good shape now.
Incidentally, we might want to move this to a new thread with a better
subject line, since the current subject line really doesn't describe
the uncommitted portion of the work. And create a new CF entry, too.
Moved here.
Moving onto 0002:
The index checking options should really be called btree index
checking options. I think I'd put the table options first, and the
btree options second. Other kinds of indexes could follow some day. I
would personally omit the short forms of --heapallindexed and
--parent-check; I think we'll run out of option names too quickly if
people add more kinds of checks.
Done.
While doing this, I also renamed some of the variables to more closely match the option name. I think the code is clearer now.
Perhaps VerifyBtreeSlotHandler should emit a warning of some kind if
PQntuples(res) != 0.
The functions bt_index_check and bt_index_parent_check are defined to return VOID, which results in PQntuples(res) == 1. I added code to verify this condition, but it only serves to alert the user if the amcheck version is behaving in an unexpected way, perhaps due to a amcheck/pg_amcheck version mismatch.
+ /* + * Test that this function works, but for now we're not using the list + * 'relations' that it builds. + */ + conn = connectDatabase(&cparams, progname, opts.echo, false, true);This comment appears to have nothing to do with the code, since
connectDatabase() does not build a list of 'relations'.
True. Removed.
amcheck_sql seems to include paranoia, but do we need that if we're
using a secure search path? Similarly for other SQL queries, e.g. in
prepare_table_command.
I removed the OPERATOR(pg_catalog.=) paranoia.
It might not be strictly necessary for the static functions in
pg_amcheck.c to use_three completelyDifferent NamingConventions for
its static functions.
The idea is that the functions that interoperate with parallel slots would follow its NamingConvention; those interoperating with patternToSQLRegex and PQExpBuffers would follow their namingConvention; and those not so interoperating would follow a less obnoxious naming_convention. To my eye, that color codes the function names in a useful way. To your eye, it just looks awful. I've changed it to use just one naming_convention.
should_processing_continue() is one semicolon over budget.
That's not the first time I've done that recently. Removed.
The initializer for opts puts a comma even after the last member
initializer. Is that going to be portable to all compilers?
I don't know. I learned to put commas at the end of lists back when I did mostly perl programming, as you get cleaner diffs when you add more stuff to the list later. Whether I can get away with that in C using initializers I don't know. I don't have a multiplicity of compilers to check.
I have removed the extra comma.
+ for (failed = false, cell = opts.include.head; cell; cell = cell->next)
I think failed has to be false here, because it gets initialized at
the top of the function. If we need to reinitialize it for some
reason, I would prefer you do that on the previous line, separate from
the for loop stuff.
It does have to be false there. There is no need to reinitialize it.
+ char *dbrgx; /* Database regexp parsed from pattern, or + * NULL */ + char *nsprgx; /* Schema regexp parsed from pattern, or NULL */ + char *relrgx; /* Relation regexp parsed from pattern, or + * NULL */ + bool tblonly; /* true if relrgx should only match tables */ + bool idxonly; /* true if relrgx should only match indexes */Maybe: db_regex, nsp_regex, rel_regex, table_only, index_only?
Just because it seems theoretically possible that someone will see
nsprgx and not immediately understand what it's supposed to mean, even
if they know that nsp is a common abbreviation for namespace in
PostgreSQL code, and even if they also know what a regular expression
is.
Changed. Along the way, I noticed that "tbl" and "idx" were being used in C/SQL both to mean ("table_only", "index_only") in some contexts and ("is_table", "is_index') in others, so I replaced all instances of "tbl" and "idx" with the unambiguous labels.
Your four messages about there being nothing to check seem like they
could be consolidated down to one: "nothing to check for pattern
\"%s\"".
I anticipated your review comment, but I'm worried about the case that somebody runs
pg_amcheck -t "foo" -i "foo"
and one of those matches and the other does not. The message 'nothing to check for pattern "foo"' will be wrong (because there was something to check for it) and unhelpful (because it doesn't say which failed to match.)
I would favor changing things so that once argument parsing is
complete, we switch to reporting all errors that way. So in other
words here, and everything that follows:+ fprintf(stderr, "%s: no databases to check\n", progname);
Same concern about the output for
pg_amcheck -t "foo" -i "foo" -d "foo"
You might think I'm being silly here, as database names, table names, and index names should in normal usage not be hard for the user to distinguish. But consider
pg_amcheck "mydb.myschema.mytable"
If it says, 'nothing to check for pattern "mydb.myschema.mytable"', you don't know if the database doesn't exist or if the table doesn't exist.
+ * ParallelSlots based event loop follows.
"Main event loop."
Changed.
To me it would read slightly better to change each reference to
"relations list" to "list of relations", but perhaps that is too
nitpicky.
No harm picking those nits. Changed.
I think the two instances of goto finish could be avoided with not
much work. At most a few things need to happen only if !failed, and
maybe not even that, if you just said "break;" instead.
Good point. Changed.
+ * Note: Heap relation corruption is returned by verify_heapam() without the + * use of raising errors, but running verify_heapam() on a corrupted table mayHow about "Heap relation corruption() is reported by verify_heapam()
via the result set, rather than an ERROR, ..."
Changed, though I assumed your parens for corruption() were not intended.
Ok, so now you've moved on to reviewing the regression tests....
It seems mighty inefficient to have a whole bunch of consecutive calls
to remove_relation_file() or corrupt_first_page() when every such call
stops and restarts the database. I would guess these tests will run
noticeably faster if you don't do that. Either the functions need to
take a list of arguments, or the stop/start needs to be pulled up and
done in the caller.
Changed.
corrupt_first_page() could use a comment explaining what exactly we're
overwriting, and in particular noting that we don't want to just
clobber the LSN, but rather something where we can detect a wrong
value.
Added comments that we're skipping past the PageHeader and overwriting garbage starting in the line pointers.
There's a long list of calls to command_checks_all() in 003_check.pl
that don't actually check anything but that the command failed, but
run it with a bunch of different options. I don't understand the value
of that, and suggest reducing the number of cases tested. If you want,
you can have tests elsewhere that focus -- perhaps by using verbose
mode -- on checking that the right tables are being checked.
This should be better in this next patch series.
This is not yet a full review of everything in this patch -- I haven't
sorted through all of the tests yet, or all of the new query
construction logic -- but to me this looks pretty close to
committable.
Thanks for the review!
Attachments:
v42-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v42-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download+342-162
v42-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v42-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download+3909-5
v42-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v42-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download+506-1
On Wed, Mar 3, 2021 at 10:22 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Your four messages about there being nothing to check seem like they
could be consolidated down to one: "nothing to check for pattern
\"%s\"".I anticipated your review comment, but I'm worried about the case that somebody runs
pg_amcheck -t "foo" -i "foo"
and one of those matches and the other does not. The message 'nothing to check for pattern "foo"' will be wrong (because there was something to check for it) and unhelpful (because it doesn't say which failed to match.)
Fair point.
Changed, though I assumed your parens for corruption() were not intended.
Uh, yeah.
Thanks for the review!
+ fprintf(stderr, "%s: no relations to check", progname);
Missing newline.
Generally, I would favor using pg_log_whatever as a way of reporting
messages starting when option parsing is complete. In other words,
starting here:
+ fprintf(stderr, "%s: no databases to check\n", progname);
I see no real advantage in having a bunch of these using
fprintf(stderr, ...), which to me seems most appropriate only for very
early failures.
Perhaps amcheck_sql could be spread across fewer lines, now that it
doesn't have so many decorations?
pg_basebackup uses -P as a short form for --progress, so maybe we
should match that here.
When I do "pg_amcheck --progress", it just says "259/259 (100%)" which
I don't find too clear. The corresponding pg_basebackup output is
"32332/32332 kB (100%), 1/1 tablespace" which has the advantage of
including units. I think if you just add the word "relations" to your
message it will be nicer.
When I do "pg_amcheck -s public" it tells me that there are no
relations to check in schemas for "public". I think "schemas matching"
would read better than "schemas for." Similar with the other messages.
When I try "pg_amcheck -t nessie" it tells me that there are no tables
to check for "nessie" but saying that there are no tables to check
matching "nessie" to me sounds more natural.
The code doesn't seem real clear on the difference between a database
name and a pattern. Consider:
createdb rhaas
createdb 'rh*s'
PGDATABASE='rh*s' pg_amcheck
It checks the rhaas database, which I venture to say is just plain wrong.
The error message when I exclude the only checkable database is not
very clear. "pg_amcheck -D rhaas" says pg_amcheck: no checkable
database: "rhaas". Well, I get that there's no checkable database. But
as a user I have no idea what "rhaas" is. I can even get it to issue
this complaint more than once:
createdb q
createdb qq
pg_amcheck -D 'q*' q qq
Now it issues the "no checkable database" complaint twice, once for q
and once for qq. But if there's no checkable database, I only need to
know that once. Either the message is wrongly-worded, or it should
only be issued once and doesn't need to include the pattern. I think
it's the second one, but I could be wrong.
Using a pattern as the only or first argument doesn't work; i.e.
"pg_amcheck rhaas" works but "pg_amcheck rhaa?" fails because there is
no database with that exact literal name. This seems like another
instance of confusion between a literal database name and a database
name pattern. I'm not quite sure what the right solution is here. We
could give up on having database patterns altogether -- the comparable
issue does not arise for database and schema name patterns -- or the
maintenance database could default to something that's not going to be
a pattern, like "postgres," rather than being taken from a
command-line argument that is intended to be a pattern. Or some hybrid
approach e.g. -d options are patterns, but don't set the maintenance
database, while extra command line arguments are literal database
names, and thus are presumably OK to use as the maintenance DB. But
it's too weird IMHO to support patterns here and then have supplying
one inevitably fail unless you also specify --maintenance-db.
It's sorta annoying that there doesn't seem to be an easy way to find
out exactly what relations got checked as a result of whatever I did.
Perhaps pg_amcheck -v should print a line for each relation saying
that it's checking that relation; it's not actually that verbose as
things stand. If we thought that was overdoing it, we could set things
up so that multiple -v options keep increasing the verbosity level, so
that you can get this via pg_amcheck -vv. I submit that pg_amcheck -e
is not useful for this purpose because the queries, besides being
long, use the relation OIDs rather than the names, so it's not easy to
see what happened.
I think that something's not working in terms of schema exclusion. If
I create a brand-new database and then run "pg_amcheck -S pg_catalog
-S information_schema -S pg_toast" it still checks stuff. In fact it
seems to check the exact same amount of stuff that it checks if I run
it with no command-line options at all. In fact, if I run "pg_amcheck
-S '*'" that still checks everything. Unless I'm misunderstanding what
this option is supposed to do, the fact that a version of this patch
where this seemingly doesn't work at all escaped to the list suggests
that your testing has got some gaps.
I like the semantics of --no-toast-expansion and --no-index-expansion
as you now have them, but I find I don't really like the names. Could
I suggest --no-dependent-indexes and --no-dependent-toast?
I tried pg_amcheck --startblock=tsgsdg and got an error message
without a trailing newline. I tried --startblock=-525523 and got no
error. I tried --startblock=99999999999999999999999999 and got a
complaint that the value was out of bounds, but without a trailing
newline. Maybe there's an argument that the bounds don't need to be
checked, but surely there's no argument for checking one and not the
other. I haven't tried the corresponding cases with --endblock but you
should. I tried --startblock=2 --endblock=1 and got a complaint that
the ending block precedes the starting block, which is totally
reasonable (though I might say "start block" and "end block" rather
than using the -ing forms) but this message is prefixed with
"pg_amcheck: " whereas the messages about an altogether invalid
starting block where not so prefixed. Is there a reason not to make
this consistent?
I also tried using a random positive integer for startblock, and for
every relation I am told "ERROR: starting block number must be
between 0 and <whatever>". That makes sense, because I used a big
number for the start block and I don't have any big relations, but it
makes for an absolute ton of output, because every verify_heapam query
is 11 lines long. This suggests a couple of possible improvements.
First, maybe we should only display the query that produced the error
in verbose mode. Second, maybe the verify_heapam() query should be
tightened up so that it doesn't stretch across quite so many lines. I
think the call to verify_heapam() could be spread across like 2 lines
rather than 7, which would improve readability. On a related note, I
wonder why we need every verify_heapam() call to join to pg_class and
pg_namespace just to fetch the schema and table name which,
presumably, we should or at least could already have. This kinda
relates to my comment earlier about making -v print a message per
relation so that we can see, in human-readable format, which relations
are getting checked. Right now, if you got an error checking just one
relation, how would you know which relation you got it from? Unless
the server happens to report that information in the message, you're
just in the dark, because pg_amcheck won't tell you.
The line "Read the description of the amcheck contrib module for
details" seems like it could be omitted. Perhaps the first line of the
help message could be changed to read "pg_amcheck uses amcheck to find
corruption in a PostgreSQL database." or something like that, instead.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mar 3, 2021, at 9:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 3, 2021 at 10:22 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Your four messages about there being nothing to check seem like they
could be consolidated down to one: "nothing to check for pattern
\"%s\"".I anticipated your review comment, but I'm worried about the case that somebody runs
pg_amcheck -t "foo" -i "foo"
and one of those matches and the other does not. The message 'nothing to check for pattern "foo"' will be wrong (because there was something to check for it) and unhelpful (because it doesn't say which failed to match.)
Fair point.
Changed, though I assumed your parens for corruption() were not intended.
Uh, yeah.
Thanks for the review!
+ fprintf(stderr, "%s: no relations to check", progname);
Missing newline.
Generally, I would favor using pg_log_whatever as a way of reporting
messages starting when option parsing is complete. In other words,
starting here:+ fprintf(stderr, "%s: no databases to check\n", progname);
I see no real advantage in having a bunch of these using
fprintf(stderr, ...), which to me seems most appropriate only for very
early failures.
Ok, the newline issues should be fixed, and the use of pg_log_{error,warning,info} is now used more consistently.
Perhaps amcheck_sql could be spread across fewer lines, now that it
doesn't have so many decorations?
Done.
pg_basebackup uses -P as a short form for --progress, so maybe we
should match that here.
Done.
When I do "pg_amcheck --progress", it just says "259/259 (100%)" which
I don't find too clear. The corresponding pg_basebackup output is
"32332/32332 kB (100%), 1/1 tablespace" which has the advantage of
including units. I think if you just add the word "relations" to your
message it will be nicer.
Done. It now shows:
% pg_amcheck -P
259/259 relations (100%) 870/870 pages (100%)
As you go along, the percent of relations processed may not be equal to the percent of pages, though at the end they are both 100%. The value of printing both can only be seen while things are underway.
When I do "pg_amcheck -s public" it tells me that there are no
relations to check in schemas for "public". I think "schemas matching"
would read better than "schemas for." Similar with the other messages.
When I try "pg_amcheck -t nessie" it tells me that there are no tables
to check for "nessie" but saying that there are no tables to check
matching "nessie" to me sounds more natural.
Done.
% pg_amcheck -s public
pg_amcheck: error: no relations to check in schemas matching "public"
The code doesn't seem real clear on the difference between a database
name and a pattern. Consider:createdb rhaas
createdb 'rh*s'
PGDATABASE='rh*s' pg_amcheckIt checks the rhaas database, which I venture to say is just plain wrong.
This next version treats any arguments supplied with -d and -D as database patterns, and all others as database names. Exclusion patterns (-D) only override inclusion patterns, not names.
The error message when I exclude the only checkable database is not
very clear. "pg_amcheck -D rhaas" says pg_amcheck: no checkable
database: "rhaas". Well, I get that there's no checkable database. But
as a user I have no idea what "rhaas" is. I can even get it to issue
this complaint more than once:createdb q
createdb qq
pg_amcheck -D 'q*' q qqNow it issues the "no checkable database" complaint twice, once for q
and once for qq. But if there's no checkable database, I only need to
know that once. Either the message is wrongly-worded, or it should
only be issued once and doesn't need to include the pattern. I think
it's the second one, but I could be wrong.
I think this whole problem goes away with the change to how -D/-d work and don't interact with database names. At least, I don't get any problems like the one you mention:
% PGDATABASE=postgres pg_amcheck -D postgres
pg_amcheck: warning: skipping database "postgres": amcheck is not installed
pg_amcheck: error: no relations to check
% PGDATABASE=mark.dilger pg_amcheck -D mark.dilger --progress
259/259 relations (100%) 870/870 pages (100%)
Using a pattern as the only or first argument doesn't work; i.e.
"pg_amcheck rhaas" works but "pg_amcheck rhaa?" fails because there is
no database with that exact literal name. This seems like another
instance of confusion between a literal database name and a database
name pattern. I'm not quite sure what the right solution is here. We
could give up on having database patterns altogether -- the comparable
issue does not arise for database and schema name patterns -- or the
maintenance database could default to something that's not going to be
a pattern, like "postgres," rather than being taken from a
command-line argument that is intended to be a pattern. Or some hybrid
approach e.g. -d options are patterns, but don't set the maintenance
database, while extra command line arguments are literal database
names, and thus are presumably OK to use as the maintenance DB. But
it's too weird IMHO to support patterns here and then have supplying
one inevitably fail unless you also specify --maintenance-db.
Right. I think the changes in this next version address all your concerns as stated, but here are some examples:
% pg_amcheck "mark.d*" --progress
pg_amcheck: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "mark.d*" does not exist
% PGDATABASE=postgres pg_amcheck "mark.d*" --progress
pg_amcheck: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "mark.d*" does not exist
% PGDATABASE=postgres pg_amcheck -d "mark.d*" --progress
520/520 relations (100%) 1815/1815 pages (100%)
% pg_amcheck --all --maintenance-db="mark.d*" --progress
pg_amcheck: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "mark.d*" does not exist
% pg_amcheck --all -D="mark.d*" --progress
pg_amcheck: warning: skipping database "template1": amcheck is not installed
520/520 relations (100%) 1815/1815 pages (100%)
It's sorta annoying that there doesn't seem to be an easy way to find
out exactly what relations got checked as a result of whatever I did.
Perhaps pg_amcheck -v should print a line for each relation saying
that it's checking that relation; it's not actually that verbose as
things stand. If we thought that was overdoing it, we could set things
up so that multiple -v options keep increasing the verbosity level, so
that you can get this via pg_amcheck -vv. I submit that pg_amcheck -e
is not useful for this purpose because the queries, besides being
long, use the relation OIDs rather than the names, so it's not easy to
see what happened.
I added that, as shown here:
% pg_amcheck mark.dilger --table=pg_subscription --table=pg_publication -v
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
pg_amcheck: checking btree index "mark.dilger"."pg_toast"."pg_toast_6100_index" (oid 4184) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_publication_oid_index" (oid 6110) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_publication_pubname_index" (oid 6111) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_subscription_oid_index" (oid 6114) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_subscription_subname_index" (oid 6115) (1/1 page)
pg_amcheck: checking table "mark.dilger"."pg_toast"."pg_toast_6100" (oid 4183) (0/0 pages)
pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_subscription" (oid 6100) (0/0 pages)
pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_publication" (oid 6104) (0/0 pages)
I think that something's not working in terms of schema exclusion. If
I create a brand-new database and then run "pg_amcheck -S pg_catalog
-S information_schema -S pg_toast" it still checks stuff. In fact it
seems to check the exact same amount of stuff that it checks if I run
it with no command-line options at all. In fact, if I run "pg_amcheck
-S '*'" that still checks everything. Unless I'm misunderstanding what
this option is supposed to do, the fact that a version of this patch
where this seemingly doesn't work at all escaped to the list suggests
that your testing has got some gaps.
Good catch. That works now, but beware that -S doesn't apply to excluding things brought in by toast or index expansion, so:
% pg_amcheck mark.dilger -S pg_catalog -S pg_toast --progress -v
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/14 relations (0%) 0/90 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (45/45 pages)
pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_features" (oid 13051) (8/8 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_toast"."pg_toast_13051_index" (oid 13055) (1/1 page)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_implementation_info" (oid 13056) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_toast"."pg_toast_13056_index" (oid 13060) (1/1 page)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_parts" (oid 13061) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_toast"."pg_toast_13061_index" (oid 13065) (1/1 page)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_sizing" (oid 13066) (1/1 page)
pg_amcheck: checking btree index "mark.dilger"."pg_toast"."pg_toast_13066_index" (oid 13070) (1/1 page)
pg_amcheck: checking table "mark.dilger"."pg_toast"."pg_toast_13051" (oid 13054) (0/0 pages)
pg_amcheck: checking table "mark.dilger"."pg_toast"."pg_toast_13056" (oid 13059) (0/0 pages)
pg_amcheck: checking table "mark.dilger"."pg_toast"."pg_toast_13061" (oid 13064) (0/0 pages)
pg_amcheck: checking table "mark.dilger"."pg_toast"."pg_toast_13066" (oid 13069) (0/0 pages)
14/14 relations (100%) 90/90 pages (100%)
but
% pg_amcheck mark.dilger -S pg_catalog -S pg_toast -S information_schema --progress -v
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/2 relations (0%) 0/75 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (45/45 pages)
pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
2/2 relations (100%) 75/75 pages (100%)
The first one checks so much because the toast and indexes for tables in the "information_schema" are not excluded by -S, but:
% pg_amcheck mark.dilger -S pg_catalog -S pg_toast --progress --no-dependent-indexes --no-dependent-toast -v
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/5 relations (0%) 0/56 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (45/45 pages)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_features" (oid 13051) (8/8 pages)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_implementation_info" (oid 13056) (1/1 page)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_parts" (oid 13061) (1/1 page)
pg_amcheck: checking table "mark.dilger"."information_schema"."sql_sizing" (oid 13066) (1/1 page)
5/5 relations (100%) 56/56 pages (100%)
works as you might expect.
I like the semantics of --no-toast-expansion and --no-index-expansion
as you now have them, but I find I don't really like the names. Could
I suggest --no-dependent-indexes and --no-dependent-toast?
Changed.
I tried pg_amcheck --startblock=tsgsdg and got an error message
without a trailing newline.
Fixed.
I tried --startblock=-525523 and got no
error.
Fixed.
I tried --startblock=99999999999999999999999999 and got a
complaint that the value was out of bounds, but without a trailing
newline.
Fixed.
Maybe there's an argument that the bounds don't need to be
checked, but surely there's no argument for checking one and not the
other.
It checks both now, and also for --endblock
I haven't tried the corresponding cases with --endblock but you
should. I tried --startblock=2 --endblock=1 and got a complaint that
the ending block precedes the starting block, which is totally
reasonable (though I might say "start block" and "end block" rather
than using the -ing forms)
I think this is fixed up now. There is an interaction with amcheck's verify_heapam(), where that function raises an error if the startblock or endblock arguments are out of bounds for the relation in question. Rather than aborting the entire pg_amcheck run, it avoids passing inappropriate block ranges to verify_heapam() and outputs a warning, so:
% pg_amcheck mark.dilger -t foo -t pg_class --progress -v --startblock=35 --endblock=77
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/6 relations (0%) 0/55 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (10/45 pages)
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."public"."foo"
pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_class" (oid 1259) (0/13 pages)
pg_amcheck: warning: ignoring startblock option 35 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_relname_nsp_index" (oid 2663) (6/6 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_tblspc_relfilenode_index" (oid 3455) (5/5 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_oid_index" (oid 2662) (4/4 pages)
6/6 relations (100%) 55/55 pages (100%)
The way the (x/y pages) is printed takes into account that the [startblock..endblock] range may reduce the number of pages to check (x) to something less than the number of pages in the relation (y), but the reporting is a bit of a lie when the startblock is beyond the end of the table, as it doesn't get passed to verify_heapam and so the number of blocks checked may be more than the zero blocks reported. I think I might need to fix this up tomorrow, but I want to get what I have in this patch set posted tonight, so it's not fixed here. Also, there are multiple ways of addressing this, and I'm having trouble deciding which way is best. I can exclude the relation from being checked at all, or realize earlier that I'm not going to honor the startblock argument and compute the blocks to check correctly. Thoughts?
but this message is prefixed with
"pg_amcheck: " whereas the messages about an altogether invalid
starting block where not so prefixed. Is there a reason not to make
this consistent?
That was a stray usage of pg_log_error where fprintf should have been used. Fixed.
I also tried using a random positive integer for startblock, and for
every relation I am told "ERROR: starting block number must be
between 0 and <whatever>". That makes sense, because I used a big
number for the start block and I don't have any big relations, but it
makes for an absolute ton of output, because every verify_heapam query
is 11 lines long.
This happens because the range was being passed down to verify_heapam. It won't do that now.
This suggests a couple of possible improvements.
First, maybe we should only display the query that produced the error
in verbose mode.
No longer relevant.
Second, maybe the verify_heapam() query should be
tightened up so that it doesn't stretch across quite so many lines.
Not a bad idea, but no longer relevant to the startblock/endblock issues. Done.
I
think the call to verify_heapam() could be spread across like 2 lines
rather than 7, which would improve readability.
Done.
On a related note, I
wonder why we need every verify_heapam() call to join to pg_class and
pg_namespace just to fetch the schema and table name which,
presumably, we should or at least could already have.
We didn't have it, but we do now, so that join is removed.
This kinda
relates to my comment earlier about making -v print a message per
relation so that we can see, in human-readable format, which relations
are getting checked.
Done.
Right now, if you got an error checking just one
relation, how would you know which relation you got it from? Unless
the server happens to report that information in the message, you're
just in the dark, because pg_amcheck won't tell you.
That information is now included in the query text, so you can see it in the error message along with the oid.
The line "Read the description of the amcheck contrib module for
details" seems like it could be omitted. Perhaps the first line of the
help message could be changed to read "pg_amcheck uses amcheck to find
corruption in a PostgreSQL database." or something like that, instead.
Done.
Attachments:
v43-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v43-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download+342-162
v43-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v43-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download+4185-5
v43-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v43-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download+506-1
Most of these changes sound good. I'll go through the whole patch
again today, or as much of it as I can. But before I do that, I want
to comment on this point specifically.
On Thu, Mar 4, 2021 at 1:25 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I think this is fixed up now. There is an interaction with amcheck's verify_heapam(), where that function raises an error if the startblock or endblock arguments are out of bounds for the relation in question. Rather than aborting the entire pg_amcheck run, it avoids passing inappropriate block ranges to verify_heapam() and outputs a warning, so:
% pg_amcheck mark.dilger -t foo -t pg_class --progress -v --startblock=35 --endblock=77
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/6 relations (0%) 0/55 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (10/45 pages)
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."public"."foo"
pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_class" (oid 1259) (0/13 pages)
pg_amcheck: warning: ignoring startblock option 35 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_relname_nsp_index" (oid 2663) (6/6 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_tblspc_relfilenode_index" (oid 3455) (5/5 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_oid_index" (oid 2662) (4/4 pages)
6/6 relations (100%) 55/55 pages (100%)The way the (x/y pages) is printed takes into account that the [startblock..endblock] range may reduce the number of pages to check (x) to something less than the number of pages in the relation (y), but the reporting is a bit of a lie when the startblock is beyond the end of the table, as it doesn't get passed to verify_heapam and so the number of blocks checked may be more than the zero blocks reported. I think I might need to fix this up tomorrow, but I want to get what I have in this patch set posted tonight, so it's not fixed here. Also, there are multiple ways of addressing this, and I'm having trouble deciding which way is best. I can exclude the relation from being checked at all, or realize earlier that I'm not going to honor the startblock argument and compute the blocks to check correctly. Thoughts?
I think this whole approach is pretty suspect because the number of
blocks in the relation can increase (by relation extension) or
decrease (by VACUUM or TRUNCATE) between the time when we query for
the list of target relations and the time we get around to executing
any queries against them. I think it's OK to use the number of
relation pages for progress reporting because progress reporting is
only approximate anyway, but I wouldn't print them out in the progress
messages, and I wouldn't try to fix up the startblock and endblock
arguments on the basis of how long you think that relation is going to
be. You seem to view the fact that the server reported the error as
the reason for the problem, but I don't agree. I think having the
server report the error here is right, and the problem is that the
error reporting sucked because it was long-winded and didn't
necessarily tell you which table had the problem.
There are a LOT of things that can go wrong when we go try to run
verify_heapam on a table. The table might have been dropped; in fact,
on a busy production system, such cases are likely to occur routinely
if DDL is common, which for many users it is. The system catalog
entries might be screwed up, so that the relation can't be opened.
There might be an unreadable page in the relation, either because the
OS reports an I/O error or something like that, or because checksum
verification fails. There are various other possibilities. We
shouldn't view such errors as low-level things that occur only in
fringe cases; this is a corruption-checking tool, and we should expect
that running it against messed-up databases will be common. We
shouldn't try to interpret the errors we get or make any big decisions
about them, but we should have a clear way of reporting them so that
the user can decide what to do.
Just as an experiment, I suggest creating a database with 100 tables
in it, each with 1 index, and then deleting a single pg_attribute
entry for 10 of the tables, and then running pg_amcheck. I think you
will get 20 errors - one for each messed-up table and one for the
corresponding index. Maybe you'll get errors for the TOAST tables
checks too, if the tables have TOAST tables, although that seems like
it should be avoidable. Now, now matter what you do, the tool is going
to produce a lot of output here, because you have a lot of problems,
and that's OK. But how understandable is that output, and how concise
is it? If it says something like:
pg_amcheck: could not check "SCHEMA_NAME"."TABLE_NAME": ERROR: some
attributes are missing or something
...and that line is repeated 20 times, maybe with a context or detail
line for each one or something like that, then you have got a good UI.
If it's not clear which tables have the problem, you have got a bad
UI. If it dumps out 300 lines of output instead of 20 or 40, you have
a UI that is so verbose that usability is going to be somewhat
impaired, which is why I suggested only showing the query in verbose
mode.
BTW, another thing that might be interesting is to call
PQsetErrorVerbosity(conn, PQERRORS_VERBOSE) in verbose mode. It's
probably possible to contrive a case where the server error message is
something generic like "cache lookup failed for relation %u" which
occurs in a whole bunch of places in the source code, and being able
get the file and line number information can be really useful when
trying to track such things down.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 4, 2021 at 10:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
Most of these changes sound good. I'll go through the whole patch
again today, or as much of it as I can. But before I do that, I want
to comment on this point specifically.
Just a thought - I don't feel strongly about this - but you want want
to consider storing your list of patterns in an array that gets
resized as necessary rather than a list. Then the pattern ID would
just be pattern_ptr - pattern_array, and finding the pattern by ID
would just be pattern_ptr = &pattern_array[pattern_id]. I don't think
there's a real efficiency issue here because the list of patterns is
almost always going to be short, and even if somebody decides to
provide a very long list of patterns (e.g. by using xargs) it's
probably still not that big a deal. A sufficiently obstinate user
running an operating system where argument lists can be extremely long
could probably make this the dominant cost by providing a gigantic
number of patterns that don't match anything, but such a person is
trying to prove a point, rather than accomplish anything useful, so I
don't care. But, the code might be more elegant the other way.
This patch increases the number of cases where we use ^ to assert that
exactly one of two things is true from 4 to 5. I think it might be
better to just write out (a && !b) || (b && !a), but there is some
precedent for the way you did it so perhaps it's fine.
The name prepare_table_commmand() is oddly non-parallel with
verify_heapam_slot_handler(). Seems better to call it either a table
throughout, or a heapam throughout. Actually I think I would prefer
"heap" to either of those, but I definitely think we shouldn't switch
terminology. Note that prepare_btree_command() doesn't have this
issue, since it matches verify_btree_slot_handler(). On a related
note, "c.relam = 2" is really a test for is_heap, not is_table. We
might have other table AMs in the future, but only one of those AMs
will be called heap, and only one will have OID 2.
You've got some weird round-tripping stuff where you sent literal
values to the server so that you can turn around and get them back
from the server. For example, you've got prepare_table_command()
select rel->nspname and rel->relname back from the server as literals,
which seems silly because we have to already have that information or
we couldn't ask the server to give it to us ... and if we already have
it, then why do we need to get it again? The reason it's like this
seems to be that after calling prepare_table_command(), we use
ParallelSlotSetHandler() to set verify_heapam_slot_handler() as the
callback, and we set sql.data as the callback, so we don't have access
to the RelationInfo object when we're handling the slot result. But
that's easy to fix: just store the sql as a field inside the
RelationInfo, and then pass a pointer to the whole RelationInfo to the
slot handler. Then you don't need to round-trip the table and schema
names; and you have the values available even if an error happens.
On a somewhat related note, I think it might make sense to have the
slot handlers try to free memory. It seems hard to make pg_amcheck
leak enough memory to matter, but I guess it's not entirely
implausible that someone could be checking let's say 10 million
relations. Freeing the query strings could probably prevent a half a
GB or so of accumulated memory usage under those circumstances. I
suppose freeing nspname and relname would save a bit more, but it's
hardly worth doing since they are a lot shorter and you've got to have
all that information in memory at once at some point anyway; similarly
with the RelationInfo structures, which have the further complexity of
being part of a linked list you might not want to corrupt. But you
don't need to have every query string in memory at the same time, just
as many as are running at one in time.
Also, maybe compile_relation_list_one_db() should keep the result set
around so that you don't need to pstrdup() the nspname and relname in
the first place. Right now, just before compile_relation_list_one_db()
calls PQclear() you have two copies of every nspname and relname
allocated. If you just kept the result sets around forever, the peak
memory usage would be lower than it is currently. If you really wanted
to get fancy you could arrange to free each result set when you've
finished that database, but that seems annoying to code and I'm pretty
sure it doesn't matter.
The CTEs called "include_raw" and "exclude_raw" which are used as part
of the query to construct a list of tables. The regexes are fished
through there, and the pattern IDs, which makes sense, but the raw
patterns are also fished through, and I don't see a reason for that.
We don't seem to need that for anything. The same seems to apply to
the query used to resolve database patterns.
I see that most of the queries have now been adjusted to be spread
across fewer lines, which is good, but please make sure to do that
everywhere. In particular, I notice that the bt_index_check calls are
still too spread out.
More in a bit, need to grab some lunch.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 4, 2021 at 12:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
More in a bit, need to grab some lunch.
Moving on to the tests, in 003_check.pl, I think it would be slightly
better if relation_toast were to select ct.oid::regclass and then just
have the caller use that value directly. We'd certainly want to do
that if the name could contain any characters that might require
quoting. Here that's not possible, but I think we might as well use
the same technique anyway.
I'm not sure how far to go with it, but I think that you might want to
try to enhance the logging in some of the cases where the TAP tests
might fail. In particular, if either of these trip in the buildfarm,
it doesn't seem like it will be too easy to figure out why they
failed:
+ fail('Xid thresholds not as expected');
+ fail('Page layout differs from our expectations');
You might want to rephrase the message to incorporate the values that
triggered the failure, e.g. "datfrozenxid $datfrozenxid is not between
3 and $relfrozenxid", "expected (a,b) = (12345678,abcdefg) but got
($x,$y)", so that if the buildfarm happens to fail there's a shred of
hope that we might be able to guess the reason from the message. You
could also give some thought to whether there are any tests that can
be improved in similar ways. Test::More is nice in that when you run a
test with eq() or like() and it fails it will tell you about the input
values in the diagnostic, but if you do something like is($x < 4, ...)
instead of cmp_ok($x, '<', 4, ...) then you lose that. I'm not saying
you're doing that exact thing, just saying that looking through the
test code with an eye to finding things where you could output a
little more info about a potential failure might be a worthwhile
activity.
If it were me, I would get rid of ROWCOUNT and have a list of
closures, and then loop over the list and call each one e.g. my
@corruption = ( sub { ... }, sub { ... }, sub { ... }) or maybe
something like what I did with @scenario in
src/bin/pg_verifybackup/t/003_corruption.pl, but this is ultimately a
style preference and I think the way you actually did it is also
reasonable, and some people might find it more readable than the other
way.
The name int4_fickle_ops is positively delightful and I love having a
test case like this.
On the whole, I think these tests look quite solid. I am a little
concerned, as you may gather from the comment above, that they will
not survive contact with the buildfarm, because they will turn out to
be platform or OS-dependent in some way. However, I can see that
you've taken steps to avoid such dependencies, and maybe we'll be
lucky and those will work. Also, while I am suspicious something's
going to break, I don't know what it's going to be, so I can't suggest
any method to avoid it. I think we'll just have to keep an eye on the
buildfarm post-commit and see what crops up.
Turning to the documentation, I see that it is documented that a bare
command-line argument can be a connection string rather than a
database name. That sounds like a good plan, but when I try
'pg_amcheck sslmode=require' it does not work: FATAL: database
"sslmode=require" does not exist. The argument to -e is also
documented to be a connection string, but that also seems not to work.
Some thought might need to be given to what exactly these connection
opens are supposed to mean. Like, do the connection options I set via
-e apply to all the connections I make, or just the one to the
maintenance database? How do I set connection options for connections
to databases whose names aren't specified explicitly but are
discovered by querying pg_database? Maybe instead of allowing these to
be a connection string, we should have a separate option that can be
used just for the purpose of setting connection options that then
apply to all connections. That seems a little bit oddly unlike other
tools, but if I want sslmode=verify-ca or something on all my
connections, there should be an easy way to get it.
The documentation makes many references to patterns, but does not
explain what a pattern is. I see that psql's documentation contains an
explanation, and pg_dump's documentation links to psql's
documentation. pg_amcheck should probably link to psql's
documentation, too.
In the documentation for -d, you say that "If -a --all is also
specified, -d --database does not additionally affect which databases
are checked." I suggest replacing "does not additionally affect which
databases are checked" with "has no effect."
In two places you say "without regard for" but I think it should be
"without regard to".
In the documentation for --no-strict-names you use "nor" where I think
it should say "or".
I kind of wonder whether we need --quiet. It seems like right now it
only does two things. One is to control complaints about ignoring the
startblock and endblock options, but I don't agree with that behavior
anyway. The other is control whether we complain about unmatched
patterns, but I think that could just be controlled --no-strict-names
i.e. normally an unmatched pattern results in a complaint and a
failure, but with --no-strict-names there is neither a complaint nor a
failure. Having a flag to control whether we get the message
separately from whether we get the failure doesn't seem helpful.
I don't think it's good to say "This is an alias for" in the
documentation of -i -I -t -T. I suggest instead saying "This is
similar to".
Instead of "Option BLAH takes precedence over..." I suggest "The BLAH
option takes precedence over..."
OK, that's it from me for this review pass.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Mar 4, 2021 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
I think this whole approach is pretty suspect because the number of
blocks in the relation can increase (by relation extension) or
decrease (by VACUUM or TRUNCATE) between the time when we query for
the list of target relations and the time we get around to executing
any queries against them. I think it's OK to use the number of
relation pages for progress reporting because progress reporting is
only approximate anyway, but I wouldn't print them out in the progress
messages, and I wouldn't try to fix up the startblock and endblock
arguments on the basis of how long you think that relation is going to
be.
I don't think that the struct AmcheckOptions block fields (e.g.,
startblock) should be of type 'long' -- that doesn't work well on
Windows, where 'long' is only 32-bit. To be fair we already do the
same thing elsewhere, but there is no reason to repeat those mistakes.
(I'm rather suspicious of 'long' in general.)
I think that you could use BlockNumber + strtoul() without breaking Windows.
There are a LOT of things that can go wrong when we go try to run
verify_heapam on a table. The table might have been dropped; in fact,
on a busy production system, such cases are likely to occur routinely
if DDL is common, which for many users it is. The system catalog
entries might be screwed up, so that the relation can't be opened.
There might be an unreadable page in the relation, either because the
OS reports an I/O error or something like that, or because checksum
verification fails. There are various other possibilities. We
shouldn't view such errors as low-level things that occur only in
fringe cases; this is a corruption-checking tool, and we should expect
that running it against messed-up databases will be common. We
shouldn't try to interpret the errors we get or make any big decisions
about them, but we should have a clear way of reporting them so that
the user can decide what to do.
I agree.
Your database is not supposed to be corrupt. Once your database has
become corrupt, all bets are off -- something happened that was
supposed to be impossible -- which seems like a good reason to be
modest about what we think we know.
The user should always see the unvarnished truth. pg_amcheck should
not presume to suppress errors from lower level code, except perhaps
in well-scoped special cases.
--
Peter Geoghegan
On Mar 4, 2021, at 2:04 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Mar 4, 2021 at 7:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
I think this whole approach is pretty suspect because the number of
blocks in the relation can increase (by relation extension) or
decrease (by VACUUM or TRUNCATE) between the time when we query for
the list of target relations and the time we get around to executing
any queries against them. I think it's OK to use the number of
relation pages for progress reporting because progress reporting is
only approximate anyway, but I wouldn't print them out in the progress
messages, and I wouldn't try to fix up the startblock and endblock
arguments on the basis of how long you think that relation is going to
be.I don't think that the struct AmcheckOptions block fields (e.g.,
startblock) should be of type 'long' -- that doesn't work well on
Windows, where 'long' is only 32-bit. To be fair we already do the
same thing elsewhere, but there is no reason to repeat those mistakes.
(I'm rather suspicious of 'long' in general.)I think that you could use BlockNumber + strtoul() without breaking Windows.
Fair enough.
There are a LOT of things that can go wrong when we go try to run
verify_heapam on a table. The table might have been dropped; in fact,
on a busy production system, such cases are likely to occur routinely
if DDL is common, which for many users it is. The system catalog
entries might be screwed up, so that the relation can't be opened.
There might be an unreadable page in the relation, either because the
OS reports an I/O error or something like that, or because checksum
verification fails. There are various other possibilities. We
shouldn't view such errors as low-level things that occur only in
fringe cases; this is a corruption-checking tool, and we should expect
that running it against messed-up databases will be common. We
shouldn't try to interpret the errors we get or make any big decisions
about them, but we should have a clear way of reporting them so that
the user can decide what to do.I agree.
Your database is not supposed to be corrupt. Once your database has
become corrupt, all bets are off -- something happened that was
supposed to be impossible -- which seems like a good reason to be
modest about what we think we know.The user should always see the unvarnished truth. pg_amcheck should
not presume to suppress errors from lower level code, except perhaps
in well-scoped special cases.
I think Robert mistook why I was doing that. I was thinking about a different usage pattern. If somebody thinks a subset of relations have been badly corrupted, but doesn't know which relations those might be, they might try to find them with pg_amcheck, but wanting to just check the first few blocks per relation in order to sample the relations. So,
pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
or something like that. I don't think it's very fun to have it error out for each relation that doesn't have at least ten blocks, nor is it fun to have those relations skipped by error'ing out before checking any blocks, as they might be the corrupt relations you are looking for. But using --startblock and --endblock for this is not a natural fit, as evidenced by how I was trying to "fix things up" for the user, so I'll punt on this usage until some future version, when I might add a sampling option.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I think Robert mistook why I was doing that. I was thinking about a different usage pattern. If somebody thinks a subset of relations have been badly corrupted, but doesn't know which relations those might be, they might try to find them with pg_amcheck, but wanting to just check the first few blocks per relation in order to sample the relations. So,
pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
or something like that. I don't think it's very fun to have it error out for each relation that doesn't have at least ten blocks, nor is it fun to have those relations skipped by error'ing out before checking any blocks, as they might be the corrupt relations you are looking for. But using --startblock and --endblock for this is not a natural fit, as evidenced by how I was trying to "fix things up" for the user, so I'll punt on this usage until some future version, when I might add a sampling option.
I admit I hadn't thought of that use case. I guess somebody could want
to do that, but it doesn't seem all that useful. Checking the first
up-to-ten blocks of every relation is not a very representative
sample, and it's not clear to me that sampling is a good idea even if
it were representative. What good is it to know that 10% of my
database is probably not corrupted?
On the other hand, people want to do all kinds of things that seem
strange to me, and this might be another one. But, if that's so, then
I think the right place to implement it is in amcheck itself, not
pg_amcheck. I think pg_amcheck should be, now and in the future, a
thin wrapper around the functionality provided by amcheck, just
providing target selection and parallel execution. If you put
something into pg_amcheck that figures out how long the relation is
and runs it on some of the blocks, that functionality is only
accessible to people who are accessing amcheck via pg_amcheck. If you
put it in amcheck itself and just expose it through pg_amcheck, then
it's accessible either way. It's probably cleaner and more performant
to do it that way, too.
So if you did add a sampling option in the future, that's the way I
would recommend doing it, but I think it is probably best not to go
there right now.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mar 8, 2021, at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I think Robert mistook why I was doing that. I was thinking about a different usage pattern. If somebody thinks a subset of relations have been badly corrupted, but doesn't know which relations those might be, they might try to find them with pg_amcheck, but wanting to just check the first few blocks per relation in order to sample the relations. So,
pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
or something like that. I don't think it's very fun to have it error out for each relation that doesn't have at least ten blocks, nor is it fun to have those relations skipped by error'ing out before checking any blocks, as they might be the corrupt relations you are looking for. But using --startblock and --endblock for this is not a natural fit, as evidenced by how I was trying to "fix things up" for the user, so I'll punt on this usage until some future version, when I might add a sampling option.
I admit I hadn't thought of that use case. I guess somebody could want
to do that, but it doesn't seem all that useful. Checking the first
up-to-ten blocks of every relation is not a very representative
sample, and it's not clear to me that sampling is a good idea even if
it were representative. What good is it to know that 10% of my
database is probably not corrupted?
`cd $PGDATA; tar xfz my_csv_data.tgz` ctrl-C ctrl-C ctrl-C
`rm -rf $PGDATA` ctrl-C ctrl-C ctrl-C
`/my/stupid/backup/and/restore/script.sh` ctrl-C ctrl-C ctrl-C
# oh wow, i wonder if any relations got overwritten with csv file data, or had their relation files unlinked, or ...?
`pg_amcheck --jobs=8 --startblock=0 --endblock=10`
# ah, darn, it's spewing lots of irrelevant errors because some relations are too short
`pg_amcheck --jobs=8 --startblock=0 --endblock=0`
# ah, darn, it's still spewing lots of irrelevant errors because I have lots of indexes with zero blocks of data
`pg_amcheck --jobs=8`
# ah, darn, it's taking forever, because it's processing huge tables in their entirety
I agree this can be left to later, and the --startblock and --endblock options are the wrong way to do it.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert, Peter, in response to your review comments spanning multiple emails:
On Mar 4, 2021, at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Most of these changes sound good. I'll go through the whole patch
again today, or as much of it as I can. But before I do that, I want
to comment on this point specifically.On Thu, Mar 4, 2021 at 1:25 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I think this is fixed up now. There is an interaction with amcheck's verify_heapam(), where that function raises an error if the startblock or endblock arguments are out of bounds for the relation in question. Rather than aborting the entire pg_amcheck run, it avoids passing inappropriate block ranges to verify_heapam() and outputs a warning, so:
% pg_amcheck mark.dilger -t foo -t pg_class --progress -v --startblock=35 --endblock=77
pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
0/6 relations (0%) 0/55 pages (0%)
pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (10/45 pages)
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."public"."foo"
pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_class" (oid 1259) (0/13 pages)
pg_amcheck: warning: ignoring startblock option 35 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_relname_nsp_index" (oid 2663) (6/6 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_tblspc_relfilenode_index" (oid 3455) (5/5 pages)
pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_oid_index" (oid 2662) (4/4 pages)
6/6 relations (100%) 55/55 pages (100%)The way the (x/y pages) is printed takes into account that the [startblock..endblock] range may reduce the number of pages to check (x) to something less than the number of pages in the relation (y), but the reporting is a bit of a lie when the startblock is beyond the end of the table, as it doesn't get passed to verify_heapam and so the number of blocks checked may be more than the zero blocks reported. I think I might need to fix this up tomorrow, but I want to get what I have in this patch set posted tonight, so it's not fixed here. Also, there are multiple ways of addressing this, and I'm having trouble deciding which way is best. I can exclude the relation from being checked at all, or realize earlier that I'm not going to honor the startblock argument and compute the blocks to check correctly. Thoughts?
I think this whole approach is pretty suspect because the number of
blocks in the relation can increase (by relation extension) or
decrease (by VACUUM or TRUNCATE) between the time when we query for
the list of target relations and the time we get around to executing
any queries against them. I think it's OK to use the number of
relation pages for progress reporting because progress reporting is
only approximate anyway,
Fair point.
but I wouldn't print them out in the progress
messages,
Removed.
and I wouldn't try to fix up the startblock and endblock
arguments on the basis of how long you think that relation is going to
be.
Yeah, in light of a new day, that seems like a bad idea to me, too. Removed.
You seem to view the fact that the server reported the error as
the reason for the problem, but I don't agree. I think having the
server report the error here is right, and the problem is that the
error reporting sucked because it was long-winded and didn't
necessarily tell you which table had the problem.
No, I was thinking about a different usage pattern, but I've answer that already elsewhere on this thread.
There are a LOT of things that can go wrong when we go try to run
verify_heapam on a table. The table might have been dropped; in fact,
on a busy production system, such cases are likely to occur routinely
if DDL is common, which for many users it is. The system catalog
entries might be screwed up, so that the relation can't be opened.
There might be an unreadable page in the relation, either because the
OS reports an I/O error or something like that, or because checksum
verification fails. There are various other possibilities. We
shouldn't view such errors as low-level things that occur only in
fringe cases; this is a corruption-checking tool, and we should expect
that running it against messed-up databases will be common. We
shouldn't try to interpret the errors we get or make any big decisions
about them, but we should have a clear way of reporting them so that
the user can decide what to do.
Once again, I think you are right and have removed the objectionable behavior, but....
The --startblock and --endblock options make the most sense when the user is only checking one table, like
pg_amcheck --startblock=17 --endblock=19 --table=my_schema.my_corrupt_table
because the user likely has some knowledge about that table, perhaps from a prior run of pg_amcheck. The --startblock and --endblock arguments are a bit strange when used globally, as relations don't all have the same number of blocks, so
pg_amcheck --startblock=17 --endblock=19 mydb
will very likely emit lots of error messages for tables which don't have blocks in that range. That's not entirely pg_amcheck's fault, as it just did what the user asked, but it also doesn't seem super helpful. I'm not going to do anything about it in this release.
Just as an experiment, I suggest creating a database with 100 tables
in it, each with 1 index, and then deleting a single pg_attribute
entry for 10 of the tables, and then running pg_amcheck. I think you
will get 20 errors - one for each messed-up table and one for the
corresponding index. Maybe you'll get errors for the TOAST tables
checks too, if the tables have TOAST tables, although that seems like
it should be avoidable. Now, now matter what you do, the tool is going
to produce a lot of output here, because you have a lot of problems,
and that's OK. But how understandable is that output, and how concise
is it? If it says something like:pg_amcheck: could not check "SCHEMA_NAME"."TABLE_NAME": ERROR: some
attributes are missing or something...and that line is repeated 20 times, maybe with a context or detail
line for each one or something like that, then you have got a good UI.
If it's not clear which tables have the problem, you have got a bad
UI. If it dumps out 300 lines of output instead of 20 or 40, you have
a UI that is so verbose that usability is going to be somewhat
impaired, which is why I suggested only showing the query in verbose
mode.
After running 'make installcheck', if I delete all entries from pg_class where relnamespace = 'pg_toast'::regclass, by running 'pg_amcheck regression', I get lines that look like this:
heap relation "regression"."public"."quad_poly_tbl":
ERROR: could not open relation with OID 17177
heap relation "regression"."public"."gin_test_tbl":
ERROR: could not open relation with OID 24793
heap relation "regression"."pg_catalog"."pg_depend":
ERROR: could not open relation with OID 8888
heap relation "regression"."public"."spgist_text_tbl":
ERROR: could not open relation with OID 25624
which seems ok.
If instead I delete pg_attribute entries, as you suggest above, I get rows like this:
heap relation "regression"."regress_rls_schema"."rls_tbl":
ERROR: catalog is missing 1 attribute(s) for relid 26467
heap relation "regression"."regress_rls_schema"."rls_tbl_force":
ERROR: catalog is missing 1 attribute(s) for relid 26474
which also seems ok.
If instead, I manually corrupt relation files belonging to the regression database, I get lines that look like this for corrupt heap relations:
relation "regression"."public"."functional_dependencies", block 28, offset 54, attribute 0
attribute 0 with length 4294967295 ends at offset 50 beyond total tuple length 43
relation "regression"."public"."functional_dependencies", block 28, offset 55
multitransaction ID is invalid
relation "regression"."public"."functional_dependencies", block 28, offset 57
multitransaction ID is invalid
and for corrupt btree relations:
btree relation "regression"."public"."tenk1_unique1":
ERROR: high key invariant violated for index "tenk1_unique1"
DETAIL: Index tid=(1,38) points to heap tid=(70,26) page lsn=0/33A96D0.
btree relation "regression"."public"."tenk1_unique2":
ERROR: index tuple size does not equal lp_len in index "tenk1_unique2"
DETAIL: Index tid=(1,35) tuple size=4913 lp_len=16 page lsn=0/33DFD98.
HINT: This could be a torn page problem.
btree relation "regression"."public"."tenk1_thous_tenthous":
ERROR: index tuple size does not equal lp_len in index "tenk1_thous_tenthous"
DETAIL: Index tid=(1,36) tuple size=4402 lp_len=16 page lsn=0/34C0770.
HINT: This could be a torn page problem.
which likewise seems ok.
BTW, another thing that might be interesting is to call
PQsetErrorVerbosity(conn, PQERRORS_VERBOSE) in verbose mode. It's
probably possible to contrive a case where the server error message is
something generic like "cache lookup failed for relation %u" which
occurs in a whole bunch of places in the source code, and being able
get the file and line number information can be really useful when
trying to track such things down.
Good idea. I decided to also honor the --quiet flag
if (opts.verbose)
PQsetErrorVerbosity(free_slot->connection, PQERRORS_VERBOSE);
else if (opts.quiet)
PQsetErrorVerbosity(free_slot->connection, PQERRORS_TERSE);
On Mar 4, 2021, at 2:04 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I don't think that the struct AmcheckOptions block fields (e.g.,
startblock) should be of type 'long' -- that doesn't work well on
Windows, where 'long' is only 32-bit. To be fair we already do the
same thing elsewhere, but there is no reason to repeat those mistakes.
(I'm rather suspicious of 'long' in general.)I think that you could use BlockNumber + strtoul() without breaking Windows.
Thanks for reviewing!
Good points. I decided to use int64 instead of BlockNumber. The option processing needs to give a sensible error message if the user gives a negative number for the argument, so unsigned types are a bad fit.
On Thu, Mar 4, 2021 at 10:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
Most of these changes sound good. I'll go through the whole patch
again today, or as much of it as I can. But before I do that, I want
to comment on this point specifically.Just a thought - I don't feel strongly about this - but you want want
to consider storing your list of patterns in an array that gets
resized as necessary rather than a list. Then the pattern ID would
just be pattern_ptr - pattern_array, and finding the pattern by ID
would just be pattern_ptr = &pattern_array[pattern_id]. I don't think
there's a real efficiency issue here because the list of patterns is
almost always going to be short, and even if somebody decides to
provide a very long list of patterns (e.g. by using xargs) it's
probably still not that big a deal. A sufficiently obstinate user
running an operating system where argument lists can be extremely long
could probably make this the dominant cost by providing a gigantic
number of patterns that don't match anything, but such a person is
trying to prove a point, rather than accomplish anything useful, so I
don't care. But, the code might be more elegant the other way.
Done. I was not too motivated by the efficiency argument, but the code to look up patterns is cleaner when the pattern_id is an index into an array than when it is a field in a struct that has to be searched in a list.
This patch increases the number of cases where we use ^ to assert that
exactly one of two things is true from 4 to 5. I think it might be
better to just write out (a && !b) || (b && !a), but there is some
precedent for the way you did it so perhaps it's fine.
Your formulation takes longer for me to read and understand (by, perhaps, some milliseconds), but checking what C compilers guarantee to store in
bool a = (i == j);
bool b = (k == l);
I found it hard to be sure that some compiler wouldn't do weird things with that. Two "true" values a and b could pass the (a ^ b) test if they represent "true" in two different bit patterns. I don't really think there is a risk here in practice, but looking up the relevant C standards isn't quick for future readers of this code, so I went with your formulation.
The name prepare_table_commmand() is oddly non-parallel with
verify_heapam_slot_handler(). Seems better to call it either a table
throughout, or a heapam throughout. Actually I think I would prefer
"heap" to either of those, but I definitely think we shouldn't switch
terminology. Note that prepare_btree_command() doesn't have this
issue, since it matches verify_btree_slot_handler(). On a related
note, "c.relam = 2" is really a test for is_heap, not is_table. We
might have other table AMs in the future, but only one of those AMs
will be called heap, and only one will have OID 2.
Changed to use "heap" in many places where "table" was used previously, and to use "btree" in many places where "index" was used previously. The term "heapam" now only occurs as part of "verify_heapam", a function defined in contrib/amcheck and not changed here.
You've got some weird round-tripping stuff where you sent literal
values to the server so that you can turn around and get them back
from the server. For example, you've got prepare_table_command()
select rel->nspname and rel->relname back from the server as literals,
which seems silly because we have to already have that information or
we couldn't ask the server to give it to us ... and if we already have
it, then why do we need to get it again? The reason it's like this
seems to be that after calling prepare_table_command(), we use
ParallelSlotSetHandler() to set verify_heapam_slot_handler() as the
callback, and we set sql.data as the callback, so we don't have access
to the RelationInfo object when we're handling the slot result. But
that's easy to fix: just store the sql as a field inside the
RelationInfo, and then pass a pointer to the whole RelationInfo to the
slot handler. Then you don't need to round-trip the table and schema
names; and you have the values available even if an error happens.
Changed. I was doing that mostly so that people examining the server logs would have something more than the oid in the sql to suggest which table or index is being checked.
On a somewhat related note, I think it might make sense to have the
slot handlers try to free memory. It seems hard to make pg_amcheck
leak enough memory to matter, but I guess it's not entirely
implausible that someone could be checking let's say 10 million
relations. Freeing the query strings could probably prevent a half a
GB or so of accumulated memory usage under those circumstances. I
suppose freeing nspname and relname would save a bit more, but it's
hardly worth doing since they are a lot shorter and you've got to have
all that information in memory at once at some point anyway; similarly
with the RelationInfo structures, which have the further complexity of
being part of a linked list you might not want to corrupt. But you
don't need to have every query string in memory at the same time, just
as many as are running at one in time.
Changed.
Also, maybe compile_relation_list_one_db() should keep the result set
around so that you don't need to pstrdup() the nspname and relname in
the first place. Right now, just before compile_relation_list_one_db()
calls PQclear() you have two copies of every nspname and relname
allocated. If you just kept the result sets around forever, the peak
memory usage would be lower than it is currently. If you really wanted
to get fancy you could arrange to free each result set when you've
finished that database, but that seems annoying to code and I'm pretty
sure it doesn't matter.
Hmm. When compile_relation_list_one_db() is processing the ith database out of N databases, all (nspname,relname) pairs are allocated for databases in [0..i], and additionally the result set for database i is in memory. The result sets for [0..i-1] have already been freed. Keeping around the result sets for all N databases seems more expensive, considering how much stuff is in struct pg_result, if N is large and the relations are spread across the databases rather than clumped together in the last one.
I think your proposal might be a win for some users and a loss for others. Given that it is not a clear win, I don't care to implement it that way, as it takes more effort to remember which object owns which bit of memory.
I have added pfree()s to the handlers to free the nspname and relname when finished. This does little to reduce the peak memory usage, though.
The CTEs called "include_raw" and "exclude_raw" which are used as part
of the query to construct a list of tables. The regexes are fished
through there, and the pattern IDs, which makes sense, but the raw
patterns are also fished through, and I don't see a reason for that.
We don't seem to need that for anything. The same seems to apply to
the query used to resolve database patterns.
Changed.
Both queries are changed to no longer have a "pat" column, and the "id" field (renamed as "pattern_id" for clarity) is used instead.
I see that most of the queries have now been adjusted to be spread
across fewer lines, which is good, but please make sure to do that
everywhere. In particular, I notice that the bt_index_check calls are
still too spread out.
When running` pg_amcheck --echo`, the queries for a table and index now print as:
SELECT blkno, offnum, attnum, msg FROM "public".verify_heapam(
relation := 33024, on_error_stop := false, check_toast := true, skip := 'none')
SELECT * FROM "public".bt_index_check(index := '33029'::regclass, heapallindexed := false)
Which is two lines per heap table, and just one line per btree index.
On Thu, Mar 4, 2021 at 12:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
More in a bit, need to grab some lunch.
Moving on to the tests, in 003_check.pl, I think it would be slightly
better if relation_toast were to select ct.oid::regclass and then just
have the caller use that value directly. We'd certainly want to do
that if the name could contain any characters that might require
quoting. Here that's not possible, but I think we might as well use
the same technique anyway.
Using c.reltoastrelid::regclass, which is basically the same idea.
I'm not sure how far to go with it, but I think that you might want to
try to enhance the logging in some of the cases where the TAP tests
might fail. In particular, if either of these trip in the buildfarm,
it doesn't seem like it will be too easy to figure out why they
failed:+ fail('Xid thresholds not as expected'); + fail('Page layout differs from our expectations');
Ok, I've extended these messages with the extra debugging information. I have also changed them to use 'plan skip_all', since what we are really talking about here is an inability for the test to properly exercise pg_amcheck, not an actual failure of pg_amcheck to function correctly. This should save us some grief if the test isn't portable to all platforms in the build farm, though we'll have to check whether the skip messages are happening on any farm animals.
You might want to rephrase the message to incorporate the values that
triggered the failure, e.g. "datfrozenxid $datfrozenxid is not between
3 and $relfrozenxid", "expected (a,b) = (12345678,abcdefg) but got
($x,$y)", so that if the buildfarm happens to fail there's a shred of
hope that we might be able to guess the reason from the message.
Added to the skip_all message.
You
could also give some thought to whether there are any tests that can
be improved in similar ways. Test::More is nice in that when you run a
test with eq() or like() and it fails it will tell you about the input
values in the diagnostic, but if you do something like is($x < 4, ...)
instead of cmp_ok($x, '<', 4, ...) then you lose that. I'm not saying
you're doing that exact thing, just saying that looking through the
test code with an eye to finding things where you could output a
little more info about a potential failure might be a worthwhile
activity.
I'm mostly using command_checks_all and command_fails_like. The main annoyance is that when a pattern fails to match, you get a rather long error message. I'm not sure that it's lacking information, though.
If it were me, I would get rid of ROWCOUNT and have a list of
closures, and then loop over the list and call each one e.g. my
@corruption = ( sub { ... }, sub { ... }, sub { ... }) or maybe
something like what I did with @scenario in
src/bin/pg_verifybackup/t/003_corruption.pl, but this is ultimately a
style preference and I think the way you actually did it is also
reasonable, and some people might find it more readable than the other
way.
Unchanged. I think the closure idea is ok, but I am using the ROWCOUNT constant elsewhere (specifically, when inserting rows into the table) and using a constant for this helps keep the number of rows of data and the number of corruptions synchronized.
The name int4_fickle_ops is positively delightful and I love having a
test case like this.
I know you know this already, but for others reading this thread, the test using int4_fickle_ops is testing the kind of index corruption that might happen if you changed the sort order underlying an index, such as by updating collation definitions. It was simpler to not muck around with collations in the test itself, but to achieve the sort order breakage this way.
On the whole, I think these tests look quite solid. I am a little
concerned, as you may gather from the comment above, that they will
not survive contact with the buildfarm, because they will turn out to
be platform or OS-dependent in some way. However, I can see that
you've taken steps to avoid such dependencies, and maybe we'll be
lucky and those will work. Also, while I am suspicious something's
going to break, I don't know what it's going to be, so I can't suggest
any method to avoid it. I think we'll just have to keep an eye on the
buildfarm post-commit and see what crops up.
As I mentioned above, I've changed some failures to 'plan skip_all => reason', so that the build farm won't break if the tests aren't portable in ways I'm already thinking about. We'll just see if it breaks for additional ways that I'm not thinking about.
Turning to the documentation, I see that it is documented that a bare
command-line argument can be a connection string rather than a
database name. That sounds like a good plan, but when I try
'pg_amcheck sslmode=require' it does not work: FATAL: database
"sslmode=require" does not exist. The argument to -e is also
documented to be a connection string, but that also seems not to work.
Some thought might need to be given to what exactly these connection
opens are supposed to mean. Like, do the connection options I set via
-e apply to all the connections I make, or just the one to the
maintenance database? How do I set connection options for connections
to databases whose names aren't specified explicitly but are
discovered by querying pg_database? Maybe instead of allowing these to
be a connection string, we should have a separate option that can be
used just for the purpose of setting connection options that then
apply to all connections. That seems a little bit oddly unlike other
tools, but if I want sslmode=verify-ca or something on all my
connections, there should be an easy way to get it.
I'm not sure where you are getting the '-e' from. That is the short form of --echo, and not what you are likely to want. However, your larger point is valid.
I don't like the idea that pg_amcheck would handle these options in a way that is incompatible with reindexdb or vacuumdb. I think pg_amcheck can have a superset of those tools' options, but it should not have options that are incompatible with those tools' options. That way, if the extra options that pg_amcheck offers become popular, we can add support for them in those other tools. But if the options are incompatible, we'd not be able to do that without breaking backward compatibility of those tools' interfaces, which we wouldn't want to do.
As such, I have solved the problem by reducing the number of dbname arguments you can provide on the command-line to just one. (This does not limit the number of database *patterns* that you can supply.) Those tools only allow one dbname on the command line, so this is not a regression of functionality from what those tools offer. Only the single dbname argument, or single maintenance-db argument, can be a connection string. The database patterns do not support that, nor would it make sense for them to do so.
All of the following should now work:
pg_amcheck --all "port=5555 sslmode=require"
pg_amcheck --maintenance-db="host=myhost port=5555 dbname=mydb sslmode=require" --all
pg_amcheck -d foo -d bar -d baz mydb
pg_amcheck -d foo -d bar -d baz "host=myhost dbname=mydb"
Note that using --all with a connection string is a pg_amcheck extension. It doesn't currently work in reindexdb, which complains.
There is a strange case, `pg_amcheck --maintenance-db="port=5555 dbname=postgres" "port=5432 dbname=regression"`, which doesn't complain, despite there being nothing listening on port 5555. This is because pg_amcheck completely ignores the maintenance-db argument in this instance, but I have not changed this behavior, because reindexdb does the same thing.
The documentation makes many references to patterns, but does not
explain what a pattern is. I see that psql's documentation contains an
explanation, and pg_dump's documentation links to psql's
documentation. pg_amcheck should probably link to psql's
documentation, too.
A prior version of this patch had a reference to that, but no more. Thanks for noticing. I've put it back in. There is some tension here between the desire to keep the docs concise and the desire to explain things better with examples, etc. I'm not sure I've got that balance right, but I'm too close to the project to be the right person to make that call. Does it seem ok?
In the documentation for -d, you say that "If -a --all is also
specified, -d --database does not additionally affect which databases
are checked." I suggest replacing "does not additionally affect which
databases are checked" with "has no effect."
Changed.
In two places you say "without regard for" but I think it should be
"without regard to".
Changed.
In the documentation for --no-strict-names you use "nor" where I think
it should say "or".
Changed.
I kind of wonder whether we need --quiet. It seems like right now it
only does two things. One is to control complaints about ignoring the
startblock and endblock options, but I don't agree with that behavior
anyway. The other is control whether we complain about unmatched
patterns, but I think that could just be controlled --no-strict-names
i.e. normally an unmatched pattern results in a complaint and a
failure, but with --no-strict-names there is neither a complaint nor a
failure. Having a flag to control whether we get the message
separately from whether we get the failure doesn't seem helpful.
Hmm. I think that having --quiet plus --no-strict-names suppress the warnings about unmatched patterns has some value.
Also, as discussed above, I also now decrease the PGVerbosity to PQERRORS_TERSE, which has additional value, I think.
But I don't feel strongly about this, and if you'd rather --quiet be removed, that's fine, too. But I'll wait to hear back about that.
I don't think it's good to say "This is an alias for" in the
documentation of -i -I -t -T. I suggest instead saying "This is
similar to".
Changed.
Instead of "Option BLAH takes precedence over..." I suggest "The BLAH
option takes precedence over..."
Changed.
Attachments:
v44-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v44-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download+338-162
v44-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v44-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download+4174-5
v44-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v44-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download+510-1
On Wed, Mar 10, 2021 at 11:10 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Once again, I think you are right and have removed the objectionable behavior, but....
The --startblock and --endblock options make the most sense when the user is only checking one table, like
pg_amcheck --startblock=17 --endblock=19 --table=my_schema.my_corrupt_table
because the user likely has some knowledge about that table, perhaps from a prior run of pg_amcheck. The --startblock and --endblock arguments are a bit strange when used globally, as relations don't all have the same number of blocks, so
pg_amcheck --startblock=17 --endblock=19 mydb
will very likely emit lots of error messages for tables which don't have blocks in that range. That's not entirely pg_amcheck's fault, as it just did what the user asked, but it also doesn't seem super helpful. I'm not going to do anything about it in this release.
+1 to all that. I tend toward the opinion that trying to make
--startblock and --endblock do anything useful in the context of
checking multiple relations is not really possible, and therefore we
just shouldn't put any effort into it. But if user feedback shows
otherwise, we can always do something about it later.
After running 'make installcheck', if I delete all entries from pg_class where relnamespace = 'pg_toast'::regclass, by running 'pg_amcheck regression', I get lines that look like this:
heap relation "regression"."public"."quad_poly_tbl":
ERROR: could not open relation with OID 17177
In this here example, the first line ends in a colon.
relation "regression"."public"."functional_dependencies", block 28, offset 54, attribute 0
attribute 0 with length 4294967295 ends at offset 50 beyond total tuple length 43
But this here one does not. Seems like it should be consistent.
The QUALIFIED_NAME_FIELDS macro doesn't seem to be used anywhere,
which is good, because macros with unbalanced parentheses are usually
not a good plan; and a macro that expands to a comma-separated list of
things is suspect too.
"invalid skip options\n" seems too plural.
With regard to your use of strtol() for --{start,end}block, telling
the user that their input is garbage seems pejorative, even though it
may be accurate. Compare:
[rhaas EDBAS]$ pg_dump -jdsgdsgd
pg_dump: error: invalid number of parallel jobs
In the message "relation end block argument precedes start block
argument\n", I think you could lose both instances of the word
"argument" and probably the word "relation" as well. I actually don't
know why all of these messages about start and end block mention
"relation". It's not like there is some other kind of
non-relation-related start block with which it could be confused.
The comment for run_command() explains some things about the cparams
argument, but those things are false. In fact the argument is unused.
Usual PostgreSQL practice when freeing memory in e.g.
verify_heap_slot_handler is to set the pointers to NULL as well. The
performance cost of this is trivial, and it makes debugging a lot
easier should somebody accidentally write code to access one of those
things after it's been freed.
The documentation says that -D "does exclude any database that was
listed explicitly as dbname on the command line, nor does it exclude
the database chosen in the absence of any dbname argument." The first
part of this makes complete sense to me, but I'm not sure about the
second part. If I type pg_amcheck --all -D 'r*', I think I'm expecting
that "rhaas" won't be checked. Likewise, if I say pg_amcheck -d
'bob*', I think I only want to check the bob-related databases and not
rhaas.
I suggest documenting --endblock as "Check table blocks up to and
including the specified ending block number. An error will occur if a
relation being checked has fewer than this number of blocks." And
similarly for --startblock: "Check table blocks beginning with the
specified block number. An error will occur, etc." Perhaps even
mention something like "This option is probably only useful when
checking a single table." Also, the documentation here isn't clear
that this affects only table checking, not index checking.
It appears that pg_amcheck sometimes makes dummy connections to the
database that don't do anything, e.g. pg_amcheck -t 'q*' resulted in:
2021-03-10 15:00:14.273 EST [95473] LOG: connection received: host=[local]
2021-03-10 15:00:14.274 EST [95473] LOG: connection authorized:
user=rhaas database=rhaas application_name=pg_amcheck
2021-03-10 15:00:14.286 EST [95473] LOG: statement: SELECT
pg_catalog.set_config('search_path', '', false);
2021-03-10 15:00:14.290 EST [95464] DEBUG: forked new backend,
pid=95474 socket=11
2021-03-10 15:00:14.291 EST [95464] DEBUG: server process (PID 95473)
exited with exit code 0
2021-03-10 15:00:14.291 EST [95474] LOG: connection received: host=[local]
2021-03-10 15:00:14.293 EST [95474] LOG: connection authorized:
user=rhaas database=rhaas application_name=pg_amcheck
2021-03-10 15:00:14.296 EST [95474] LOG: statement: SELECT
pg_catalog.set_config('search_path', '', false);
<...more queries from PID 95474...>
2021-03-10 15:00:14.321 EST [95464] DEBUG: server process (PID 95474)
exited with exit code 0
It doesn't seem to make sense to connect to a database, set the search
path, exit, and then immediately reconnect to the same database.
This is slightly inconsistent:
pg_amcheck: checking heap table "rhaas"."public"."foo"
heap relation "rhaas"."public"."foo":
ERROR: XX000: catalog is missing 144 attribute(s) for relid 16392
LOCATION: RelationBuildTupleDesc, relcache.c:652
query was: SELECT blkno, offnum, attnum, msg FROM "public".verify_heapam(
relation := 16392, on_error_stop := false, check_toast := true, skip := 'none')
In line 1 it's a heap table, but in line 2 it's a heap relation.
That's all I've got.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mar 10, 2021, at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 10, 2021 at 11:10 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Once again, I think you are right and have removed the objectionable behavior, but....
The --startblock and --endblock options make the most sense when the user is only checking one table, like
pg_amcheck --startblock=17 --endblock=19 --table=my_schema.my_corrupt_table
because the user likely has some knowledge about that table, perhaps from a prior run of pg_amcheck. The --startblock and --endblock arguments are a bit strange when used globally, as relations don't all have the same number of blocks, so
pg_amcheck --startblock=17 --endblock=19 mydb
will very likely emit lots of error messages for tables which don't have blocks in that range. That's not entirely pg_amcheck's fault, as it just did what the user asked, but it also doesn't seem super helpful. I'm not going to do anything about it in this release.
+1 to all that. I tend toward the opinion that trying to make --startblock and --endblock do anything useful in the context of checking multiple relations is not really possible, and therefore we just shouldn't put any effort into it. But if user feedback shows otherwise, we can always do something about it later.After running 'make installcheck', if I delete all entries from pg_class where relnamespace = 'pg_toast'::regclass, by running 'pg_amcheck regression', I get lines that look like this:
heap relation "regression"."public"."quad_poly_tbl":
ERROR: could not open relation with OID 17177In this here example, the first line ends in a colon.
relation "regression"."public"."functional_dependencies", block 28, offset 54, attribute 0
attribute 0 with length 4294967295 ends at offset 50 beyond total tuple length 43But this here one does not. Seems like it should be consistent.
Good point. It also seems inconsistent that in one it refers to a "relation" and in the other to a "heap relation", but they're both heap relations. Changed to use "heap relation" both places, and to both use colons.
The QUALIFIED_NAME_FIELDS macro doesn't seem to be used anywhere,
which is good, because macros with unbalanced parentheses are usually
not a good plan; and a macro that expands to a comma-separated list of
things is suspect too.
Yeah, that whole macro was supposed to be removed. Looks like I somehow only removed the end of it, plus some functions that were using it. Not sure how I fat fingered that in the editor, but I've removed the rest now.
"invalid skip options\n" seems too plural.
Changed to something less plural.
With regard to your use of strtol() for --{start,end}block, telling
the user that their input is garbage seems pejorative, even though it
may be accurate. Compare:[rhaas EDBAS]$ pg_dump -jdsgdsgd
pg_dump: error: invalid number of parallel jobsIn the message "relation end block argument precedes start block
argument\n", I think you could lose both instances of the word
"argument" and probably the word "relation" as well. I actually don't
know why all of these messages about start and end block mention
"relation". It's not like there is some other kind of
non-relation-related start block with which it could be confused.
Changed.
The comment for run_command() explains some things about the cparams
argument, but those things are false. In fact the argument is unused.
Removed unused argument and associated comment.
Usual PostgreSQL practice when freeing memory in e.g.
verify_heap_slot_handler is to set the pointers to NULL as well. The
performance cost of this is trivial, and it makes debugging a lot
easier should somebody accidentally write code to access one of those
things after it's been freed.
I had been doing that and removed it, anticipating a complaint about useless code. Ok, I put it back.
The documentation says that -D "does exclude any database that was
listed explicitly as dbname on the command line, nor does it exclude
the database chosen in the absence of any dbname argument." The first
part of this makes complete sense to me, but I'm not sure about the
second part. If I type pg_amcheck --all -D 'r*', I think I'm expecting
that "rhaas" won't be checked. Likewise, if I say pg_amcheck -d
'bob*', I think I only want to check the bob-related databases and not
rhaas.
I think it's a tricky definitional problem. I'll argue the other side for the moment:
If you say `pg_amcheck bob`, I think it is fair to assume that "bob" gets checked. If you say `pg_amcheck bob -d="b*" -D="bo*"`, it is fair to expect all databases starting with /b/ to be checked, except those starting with /bo/, except that since you *explicitly* asked for "bob", that "bob" gets checked. We both agree on this point, I think.
If you say `pg_amcheck --maintenance-db=bob -d="b*" -D="bo*", you don't expect "bob" to get checked, even though it was explicitly stated.
If you are named "bob", and run `pg_amcheck`, you expect it to get your name "bob" from the environment, and check database "bob". It's implicit rather than explicit, but that doesn't change what you expect to happen. It's just a short-hand for saying `pg_amcheck bob`.
Saying that `pg_amcheck -d="b*" -D="bo*" should not check "bob" implies that the database being retrieved from the environment is acting like a maintenance-db. But that's not how it is treated when you just say `pg_amcheck` with no arguments. I think treating it as a maintenance-db in some situations but not in others is strangely non-orthogonal.
On the other hand, I would expect some users to come back with precisely your complaint, so I don't know how best to solve this.
I suggest documenting --endblock as "Check table blocks up to and
including the specified ending block number. An error will occur if a
relation being checked has fewer than this number of blocks." And
similarly for --startblock: "Check table blocks beginning with the
specified block number. An error will occur, etc." Perhaps even
mention something like "This option is probably only useful when
checking a single table." Also, the documentation here isn't clear
that this affects only table checking, not index checking.
Changed.
It appears that pg_amcheck sometimes makes dummy connections to the
database that don't do anything, e.g. pg_amcheck -t 'q*' resulted in:2021-03-10 15:00:14.273 EST [95473] LOG: connection received: host=[local]
2021-03-10 15:00:14.274 EST [95473] LOG: connection authorized:
user=rhaas database=rhaas application_name=pg_amcheck
2021-03-10 15:00:14.286 EST [95473] LOG: statement: SELECT
pg_catalog.set_config('search_path', '', false);
2021-03-10 15:00:14.290 EST [95464] DEBUG: forked new backend,
pid=95474 socket=11
2021-03-10 15:00:14.291 EST [95464] DEBUG: server process (PID 95473)
exited with exit code 0
2021-03-10 15:00:14.291 EST [95474] LOG: connection received: host=[local]
2021-03-10 15:00:14.293 EST [95474] LOG: connection authorized:
user=rhaas database=rhaas application_name=pg_amcheck
2021-03-10 15:00:14.296 EST [95474] LOG: statement: SELECT
pg_catalog.set_config('search_path', '', false);
<...more queries from PID 95474...>
2021-03-10 15:00:14.321 EST [95464] DEBUG: server process (PID 95474)
exited with exit code 0It doesn't seem to make sense to connect to a database, set the search
path, exit, and then immediately reconnect to the same database.
I think I've cleaned that up now.
This is slightly inconsistent:
pg_amcheck: checking heap table "rhaas"."public"."foo"
heap relation "rhaas"."public"."foo":
ERROR: XX000: catalog is missing 144 attribute(s) for relid 16392
LOCATION: RelationBuildTupleDesc, relcache.c:652
query was: SELECT blkno, offnum, attnum, msg FROM "public".verify_heapam(
relation := 16392, on_error_stop := false, check_toast := true, skip := 'none')In line 1 it's a heap table, but in line 2 it's a heap relation.
Changed to use "heap table" consistently, and along those lines, to use "btree index" rather than "btree relation".
Attachments:
v45-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v45-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download+338-162
v45-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v45-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download+4199-5
v45-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v45-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download+510-1
(I'm still not a fan of adding more client-side tools whose sole task is
to execute server-side functionality in a slightly filtered way, but it
seems people are really interested in this, so ...)
I want to register, if we are going to add this, it ought to be in
src/bin/. If we think it's a useful tool, it should be there with all
the other useful tools.
I realize there is a dependency on a module in contrib, and it's
probably now not the time to re-debate reorganizing contrib. But if we
ever get to that, this program should be the prime example why the
current organization is problematic, and we should be prepared to make
the necessary moves then.
11 марта 2021 г., в 13:12, Peter Eisentraut <peter.eisentraut@enterprisedb.com> написал(а):
client-side tools whose sole task is to execute server-side functionality in a slightly filtered way
By the way, can we teach pg_amcheck to verify database without creating local PGDATA and using bare minimum of file system quota?
We can implement a way for a pg_amcheck to ask for some specific file, which will be downloaded by backup tool and streamed to pg_amcheck.
E.g. pg_amcheck could have a restore_file_command = 'backup-tool bring-my-file %backup_id %file_name' and probably list_files_command='backup-tool list-files %backup_id'. And pg_amcheck could then fetch bare minimum of what is needed.
I see that this is somewhat orthogonal idea, but from my POV interesting one.
Best regards, Andrey Borodin.
On Mar 11, 2021, at 12:12 AM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
(I'm still not a fan of adding more client-side tools whose sole task is to execute server-side functionality in a slightly filtered way, but it seems people are really interested in this, so ...)
I want to register, if we are going to add this, it ought to be in src/bin/. If we think it's a useful tool, it should be there with all the other useful tools.
I considered putting it in src/bin/scripts where reindexdb and vacuumdb also live. It seems most similar to those two tools.
I realize there is a dependency on a module in contrib, and it's probably now not the time to re-debate reorganizing contrib. But if we ever get to that, this program should be the prime example why the current organization is problematic, and we should be prepared to make the necessary moves then.
Before settling on contrib/pg_amcheck as the location, I checked whether any tools under src/bin had dependencies on a contrib module, and couldn't find any current examples. (There seems to have been one in the past, though I forget which that was at the moment.)
I have no argument with changing the location of this tool before it gets committed, but I wonder if we should do that now, or wait until some future time when contrib gets reorganized? I can't quite tell which you prefer from your comments above.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mar 11, 2021, at 3:36 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
11 марта 2021 г., в 13:12, Peter Eisentraut <peter.eisentraut@enterprisedb.com> написал(а):
client-side tools whose sole task is to execute server-side functionality in a slightly filtered way
By the way, can we teach pg_amcheck to verify database without creating local PGDATA and using bare minimum of file system quota?
pg_amcheck does not need a local data directory to check a remote database server, though it does need to connect to that server. The local file system quota should not be a problem, as pg_amcheck does not download and save any data to disk. I am uncertain if this answers your question. If you are imagining pg_amcheck running on the same server as the database cluster, then of course running pg_amcheck puts a burden on the server to read all the relation files necessary, much as running queries over the same relations would do.
We can implement a way for a pg_amcheck to ask for some specific file, which will be downloaded by backup tool and streamed to pg_amcheck.
E.g. pg_amcheck could have a restore_file_command = 'backup-tool bring-my-file %backup_id %file_name' and probably list_files_command='backup-tool list-files %backup_id'. And pg_amcheck could then fetch bare minimum of what is needed.I see that this is somewhat orthogonal idea, but from my POV interesting one.
pg_amcheck is not designed to detect corruption directly, but rather to open one or more connections to the database and execute sql queries which employ the contrib/amcheck sql functions.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Mar 11, 2021 at 3:12 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
(I'm still not a fan of adding more client-side tools whose sole task is
to execute server-side functionality in a slightly filtered way, but it
seems people are really interested in this, so ...)I want to register, if we are going to add this, it ought to be in
src/bin/. If we think it's a useful tool, it should be there with all
the other useful tools.
I think this provides a pretty massive gain in usability. If you
wanted to check all of your tables and btree indexes without this, or
worse yet some subset of them that satisfied certain criteria, it
would be a real nuisance. You don't want to run all of the check
commands in a single transaction, because that keeps snapshots open,
and there's a good chance you do want to use parallelism. Even if you
ignore all that, the output you're going to get from running the
queries individually in psql is not going to be easy to sort through,
whereas the tool is going to distill that down to what you really need
to know.
Perhaps we should try to think of some way that some of these tools
could be unified, since it does seem a bit silly to have reindexdb,
vacuumdb, and pg_amcheck all as separate commands basically doing the
same kind of thing but for different maintenance operations, but I
don't think getting rid of them entirely is the way - and I don't
think that unifying them is a v14 project.
I also had the thought that maybe this should go in src/bin, because I
think this is going to be awfully handy for a lot of people. However,
I don't think there's a rule that binaries can't go in contrib --
oid2name and vacuumlo are existing precedents. But I guess that's only
2 out of quite a large number of binaries that we ship, so maybe it's
best not to add to it, especially for a tool which I at least suspect
is going to get a lot more use than either of those.
Anyone else want to vote for or against moving this to src/bin?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Mar 10, 2021 at 11:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The documentation says that -D "does exclude any database that was
listed explicitly as dbname on the command line, nor does it exclude
the database chosen in the absence of any dbname argument." The first
part of this makes complete sense to me, but I'm not sure about the
second part. If I type pg_amcheck --all -D 'r*', I think I'm expecting
that "rhaas" won't be checked. Likewise, if I say pg_amcheck -d
'bob*', I think I only want to check the bob-related databases and not
rhaas.I think it's a tricky definitional problem. I'll argue the other side for the moment:
If you say `pg_amcheck bob`, I think it is fair to assume that "bob" gets checked. If you say `pg_amcheck bob -d="b*" -D="bo*"`, it is fair to expect all databases starting with /b/ to be checked, except those starting with /bo/, except that since you *explicitly* asked for "bob", that "bob" gets checked. We both agree on this point, I think.
+1.
If you say `pg_amcheck --maintenance-db=bob -d="b*" -D="bo*", you don't expect "bob" to get checked, even though it was explicitly stated.
I expect that specifying --maintenance-db has zero effect on what gets
checked. The only thing that should do is tell me which database to
use to get the list of databases that I am going to check, just in
case the default is unsuitable and will fail.
If you are named "bob", and run `pg_amcheck`, you expect it to get your name "bob" from the environment, and check database "bob". It's implicit rather than explicit, but that doesn't change what you expect to happen. It's just a short-hand for saying `pg_amcheck bob`.
+1.
Saying that `pg_amcheck -d="b*" -D="bo*" should not check "bob" implies that the database being retrieved from the environment is acting like a maintenance-db. But that's not how it is treated when you just say `pg_amcheck` with no arguments. I think treating it as a maintenance-db in some situations but not in others is strangely non-orthogonal.
I don't think I agree with this. A maintenance DB in my mind doesn't
mean "a database we're not actually checking," but rather "a database
that we're using to get a list of other databases."
TBH, I guess I actually don't know why we ever treat a bare
command-line argument as a maintenance DB. I probably wouldn't do
that. We should only need a maintenance DB if we need to query for a
list of database to check, and if the user has explicitly named the
database to check, then we do not need to do that... unless they've
also done something like -D or -d, but then the explicitly-specified
database name is playing a double role. It is both one of the
databases we will check, and also the database we will use to figure
out what other databases to check. I think that's why this seems
non-orthogonal.
Here's my proposal:
1. If there are options present which require querying for a list of
databases (e.g. --all, -d, -D) then use connectMaintenanceDatabase()
and go figure out what they mean. The cparams passed to that function
are only affected by the use of --maintenance-db, not by any bare
command line arguments. If there are no arguments present which
require querying for a list of databases, then --maintenance-db has no
effect.
2. If there is a bare command line argument, add the named database to
the list of databases to be checked. This might be empty if no
relevant options were specified in step 1, or if those options matched
nothing. It might be a noop if the named database was already selected
by the options mentioned in step 1.
3. If there were no options present which required querying for a list
of databases, and if there is also no bare command line argument, then
default to the checking whatever database we connect to by default.
With this approach, --maintenance-db only ever affects how we get the
list of databases to check, and a bare command-line argument only ever
specifies a database to be checked. That seems cleaner.
An alternate possibility would be to say that there should only ever
be EITHER a bare command-line argument OR options that require
querying for a list of databases OR neither BUT NOT both. Then it's
simple:
0. If you have both options which require querying for a list of
databases and also a bare database name, error and die.
1. As above.
2. As above except the only possibility is now increasing the list of
target databases from length 0 to length 1.
3. As above.
--
Robert Haas
EDB: http://www.enterprisedb.com
11 марта 2021 г., в 20:30, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
pg_amcheck does not need a local data directory to check a remote database server, though it does need to connect to that server.
No, I mean it it would be great if we did not need to materialise whole DB anywhere. Let's say I have a backup of 10Tb cluster in S3. And don't have that clusters hardware anymore. I want to spawn tiny VM with few GiBs of RAM and storage no larger than biggest index within DB + WAL from start to end. And stream-check all backup, mark it safe and sleep well. It would be perfect if we could do backup verification at cost of corruption monitoring (and not vice versa, which is trivial).
Best regards, Andrey Borodin.