Extending USING [heap | mytam | yourtam] grammar and behavior
Hackers,
I have extended the grammar to allow "USING NOT method [, ...]" to exclude one or more TAMs in a CREATE TABLE statement. This may sound like a weird thing to do, but it is surprisingly useful when developing new Table Access Methods, particularly when you are developing two or more, not just one. To explain:
Developing a new TAM takes an awful lot of testing, and much of it is duplicative of the existing core regression test suite. Leveraging the existing tests saves an awful lot of test development.
When developing just one TAM, leveraging the existing tests isn't too hard. Without much work*, you can set default_table_access_method=mytam for the duration of the check-world. You'll get a few test failures this way. Some will be in tests that probe the catalogs to verify that /heap/ is stored there, and instead /mytam/ is found. Others will be tests that are sensitive to the number of rows that fit per page, etc. But a surprising number of tests just pass, at least after you get the TAM itself debugged.
When developing two or more TAMs, this falls apart. Some tests may be worth fixing up (perhaps with alternate output files) for "mytam", but not for "columnar_tam". That might be because the test is checking fundamentally row-store-ish properties of the table, which has no applicability to your column-store-ish TAM. In that case, "USING NOT columnar_tam" fixes the test failure when columnar is the default, without preventing the test from testing "mytam" when it happens to be the default.
Once you have enough TAMs developed and deployed, this USING NOT business becomes useful in production. You might have different defaults on different servers, or for different customers, etc., and for a given piece of DDL that you want to release you only want to say which TAMs not to use, not to nail down which TAM must be used.
Thoughts? I'll hold off posting a patch until the general idea is debated.
[*] It takes some extra work to get the TAP tests to play along.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Jun 15, 2022 at 06:16:21PM -0700, Mark Dilger wrote:
When developing two or more TAMs, this falls apart. Some tests may
be worth fixing up (perhaps with alternate output files) for
"mytam", but not for "columnar_tam". That might be because the test
is checking fundamentally row-store-ish properties of the table,
which has no applicability to your column-store-ish TAM. In that
case, "USING NOT columnar_tam" fixes the test failure when columnar
is the default, without preventing the test from testing "mytam"
when it happens to be the default.
I think that it is very important for the in-core test suite to remain
transparent in terms of options used for table AMs (or compression),
and this has improved a lot over the last years with options like
HIDE_TABLEAM and HIDE_TOAST_COMPRESSION. Things could have actually
more ORDER BY clauses to ensure more ordering of the results, as long
as the tests don't want to stress a specific planning path. However,
your problem is basically that you develop multiple AMs, but you want
to have regression tests that do checks across more than one table AM
at the same time. Am I getting that right? Why is a grammar
extension necessary for what looks like a test structure problem when
there are interdependencies across multiple AMs developped?
Once you have enough TAMs developed and deployed, this USING NOT
business becomes useful in production. You might have different
defaults on different servers, or for different customers, etc., and
for a given piece of DDL that you want to release you only want to
say which TAMs not to use, not to nail down which TAM must be used.
I am not sure to see why this would be something users would actually
use in prod. That means to pick up something else than what the
server thinks is the best default AM but where somebody does not want
to trust the default, while generating an error if specifying the
default AM in the USING NOT clause. On top of that
default_table_access_method is user-settable.
--
Michael
On Wed, Jun 15, 2022 at 8:51 PM Michael Paquier <michael@paquier.xyz> wrote:
On top of that
default_table_access_method is user-settable.
FWIW this proposal acknowledges that and basically leverages it to the
hilt, turning it into something like search_path. I strongly dislike the
idea of any workflow that depends on a GUC in this manner. The fact that
it is user-settable is, IMO, a flaw, not a feature, at least as far as
production settings are concerned.
It is a novel API for PostgreSQL to rely upon setting a GUC then attaching
"unless" configurations to individual objects to ignore it. And what would
be chosen (ultimately fallback is heap?), or whether it would simply error,
is presently, as you say, undefined.
In production this general behavior becomes useful only under the condition
that among the various named access methods some of them don't even exist
on the server in question, but that a fallback option would be acceptable
in that case. But that suggests extending "USING" to accept
multiple names, not inventing a "NOT USING".
That all said, I can understand that testing presents its own special
needs. But testing is probably where GUCs shine. So why not implement
this capability as a GUC that is set just before the table is created
instead of extending the grammar for it? Add it to "developer options" and
call it a day. Dump/Restore no longer has to care about it, and its value
once the table exists is basically zero anyway.
David J.
On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote:
However,
your problem is basically that you develop multiple AMs, but you want
to have regression tests that do checks across more than one table AM
at the same time.
It is true that I test multiple table AMs at the same time, but that's a somewhat different concern.
Am I getting that right?
Not exactly.
Why is a grammar
extension necessary for what looks like a test structure problem when
there are interdependencies across multiple AMs developped?
Ok, I didn't want to get into my exact process, because it involves other changes that I don't expect -hackers to want. But basically what I do is:
./configure --with-default-tam=chicago && make && make check-world
That fails for a few tests, and I manually change the create table statements in tests that are not chicago-compatible to "using not chicago". Then
./configure --with-default-tam=detroit && make && make check-world
That fails for some other set of tests, but note that the tests with "using not chicago" are still using detroit in this second run. That wouldn't be true if I'd fixed up the tests in the first run "using heap".
Then I can also add my own tests which might make some chicago backed tables plus some detroit backed tables and see how they interact. But that's superfluous to the issue of just trying to leverage the existing tests as much as I can without having to reinvent tests to cover "chicago", and then reinvent again to cover "detroit", and so forth.
If you develop enough TAMs in parallel, and go with the "using heap" solution, you eventually have zero coverage for any of the TAMs, because you'll eventually be "using heap" in all the tables of all the tests.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote:
I am not sure to see why this would be something users would actually
use in prod. That means to pick up something else than what the
server thinks is the best default AM but where somebody does not want
to trust the default, while generating an error if specifying the
default AM in the USING NOT clause.
Sorry for the lack of clarity. I do not suggest raising an error. If you say "USING NOT foo", and foo is the default table access method, then you get the same behavior as a "USING heap" would have gotten you, otherwise, you get the same behavior as not providing any USING clause at all.
In future, we might want to create a list of fallback tams rather than just hardcoding "heap" as the one and only fallback, but I haven't run into an actual need for that. If you're wondering what "USING NOT heap" falls back to, I think that could error, or it could just use heap anyway. Whatever. That's why I'm still soliciting for comments at this phase rather than posting a patch.
On top of that
default_table_access_method is user-settable.
Yeah, but specifying a "USING foo" clause is also open to any user, so I don't see why this matters. "USING NOT foo" is just shorthand for checking the current default_table_access_method, and then either appending a "USING heap" clause or appending no clause. Since the user can do this anyway, what's the security implication in some syntactic sugar?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company