Modest proposal to extend TableAM API for controlling cluster commands

Started by Mark Dilgeralmost 4 years ago18 messageshackers
Jump to latest
#1Mark Dilger
mark.dilger@enterprisedb.com

Hackers,

While developing various Table Access Methods, I have wanted a callback for determining if CLUSTER (and VACUUM FULL) should be run against a table backed by a given TAM. The current API contains a callback for doing the guts of the cluster, but by that time, it's a bit too late to cleanly back out. For single relation cluster commands, raising an error from that callback is probably not too bad. For multi-relation cluster commands, that aborts the clustering of other yet to be processed relations, which doesn't seem acceptable. I tried fixing this with a ProcessUtility_hook, but that fires before the multi-relation cluster command has compiled the list of relations to cluster, meaning the ProcessUtility_hook doesn't have access to the necessary information. (It can be hacked to compile the list of relations itself, but that duplicates both code and effort, with the usual risks that the code will get out of sync.)

For my personal development, I have declared a new hook, bool (*relation_supports_cluster) (Relation rel). It gets called from commands/cluster.c in both the single-relation and multi-relation code paths, with warning or debug log messages output for relations that decline to be clustered, respectively.

Before posting a patch, do people think this sounds useful? Would you like the hook function signature to differ in some way? Is a simple "yes this relation may be clustered" vs. "no this relation may not be clustered" interface overly simplistic?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#1)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:

While developing various Table Access Methods, I have wanted a callback for
determining if CLUSTER (and VACUUM FULL) should be run against a table
backed by a given TAM. The current API contains a callback for doing the
guts of the cluster, but by that time, it's a bit too late to cleanly back
out. For single relation cluster commands, raising an error from that
callback is probably not too bad. For multi-relation cluster commands, that
aborts the clustering of other yet to be processed relations, which doesn't
seem acceptable.

Why not? What else do you want to do in that case? Silently ignoring
non-clusterable tables doesn't seem right either. What's the use-case for
swallowing the error?

I tried fixing this with a ProcessUtility_hook, but that
fires before the multi-relation cluster command has compiled the list of
relations to cluster, meaning the ProcessUtility_hook doesn't have access to
the necessary information. (It can be hacked to compile the list of
relations itself, but that duplicates both code and effort, with the usual
risks that the code will get out of sync.)

For my personal development, I have declared a new hook, bool
(*relation_supports_cluster) (Relation rel). It gets called from
commands/cluster.c in both the single-relation and multi-relation code
paths, with warning or debug log messages output for relations that decline
to be clustered, respectively.

Do you actually need to dynamically decide whether CLUSTER is supported?
Otherwise we could just make the existing cluster callback optional and error
out if a table is clustered that doesn't have the callback.

Before posting a patch, do people think this sounds useful? Would you like
the hook function signature to differ in some way? Is a simple "yes this
relation may be clustered" vs. "no this relation may not be clustered"
interface overly simplistic?

It seems overly complicated, if anything ;)

Greetings,

Andres Freund

#3Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#2)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:

While developing various Table Access Methods, I have wanted a callback for
determining if CLUSTER (and VACUUM FULL) should be run against a table
backed by a given TAM. The current API contains a callback for doing the
guts of the cluster, but by that time, it's a bit too late to cleanly back
out. For single relation cluster commands, raising an error from that
callback is probably not too bad. For multi-relation cluster commands, that
aborts the clustering of other yet to be processed relations, which doesn't
seem acceptable.

Why not? What else do you want to do in that case? Silently ignoring
non-clusterable tables doesn't seem right either. What's the use-case for
swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have any defined meaning. Perhaps you've arranged the data in a way that has no similarity to heap, and now somebody runs a CLUSTER command (with no arguments.) It's reasonable that they want all their heap tables to get the usual cluster_rel treatment, and that the existence of a table using your exotic TAM shouldn't interfere with that. Then what? You are forced to copy all the data from your OldHeap (badly named) to the NewHeap (also badly named), or to raise an error. That doesn't seem ok.

I tried fixing this with a ProcessUtility_hook, but that
fires before the multi-relation cluster command has compiled the list of
relations to cluster, meaning the ProcessUtility_hook doesn't have access to
the necessary information. (It can be hacked to compile the list of
relations itself, but that duplicates both code and effort, with the usual
risks that the code will get out of sync.)

For my personal development, I have declared a new hook, bool
(*relation_supports_cluster) (Relation rel). It gets called from
commands/cluster.c in both the single-relation and multi-relation code
paths, with warning or debug log messages output for relations that decline
to be clustered, respectively.

Do you actually need to dynamically decide whether CLUSTER is supported?
Otherwise we could just make the existing cluster callback optional and error
out if a table is clustered that doesn't have the callback.

Same as above, I don't know why erroring would be the right thing to do. As a comparison, consider that we don't attempt to cluster a partitioned table, but rather just silently skip it. Imagine if, when we introduced the concept of partitioned tables, we made unqualified CLUSTER commands always fail when they encountered a partitioned table.

Before posting a patch, do people think this sounds useful? Would you like
the hook function signature to differ in some way? Is a simple "yes this
relation may be clustered" vs. "no this relation may not be clustered"
interface overly simplistic?

It seems overly complicated, if anything ;)

For the TAMs I've developed thus far, I don't need the (Relation rel) parameter, and could just have easily used (void). But that seems to fence in what other TAM authors could do in future.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#3)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 18:24:45 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:
On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:

While developing various Table Access Methods, I have wanted a callback for
determining if CLUSTER (and VACUUM FULL) should be run against a table
backed by a given TAM. The current API contains a callback for doing the
guts of the cluster, but by that time, it's a bit too late to cleanly back
out. For single relation cluster commands, raising an error from that
callback is probably not too bad. For multi-relation cluster commands, that
aborts the clustering of other yet to be processed relations, which doesn't
seem acceptable.

Why not? What else do you want to do in that case? Silently ignoring
non-clusterable tables doesn't seem right either. What's the use-case for
swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have
any defined meaning. Perhaps you've arranged the data in a way that has no
similarity to heap, and now somebody runs a CLUSTER command (with no
arguments.)

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

Greetings,

Andres Freund

#5Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#4)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target is pre-clustered, and both will run against the table if the user has run an ALTER TABLE..CLUSTER ON. Now, we could try to catch that latter command with a utility hook, but since the VACUUM FULL is still problematic, it seems cleaner to just add the callback I am proposing.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#5)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
is pre-clustered

VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
is shared, VACUUM FULL doesn't order the table contents. I see now reason why
an AM shouldn't support VACUUM FULL?

, and both will run against the table if the user has run an ALTER
TABLE..CLUSTER ON.

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Greetings,

Andres Freund

#7Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#6)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 7:14 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
is pre-clustered

VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
is shared, VACUUM FULL doesn't order the table contents. I see now reason why
an AM shouldn't support VACUUM FULL?

It's effectively a synonym which determines whether the "bool use_sort" parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM plays along and sorts or not based on that. But it's up to the TAM what it wants to do with that boolean, if in fact it does anything at all based on that. A TAM could decide to do the exact opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird behavior here>. From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are synonyms. Or am I missing something?

, and both will run against the table if the user has run an ALTER
TABLE..CLUSTER ON.

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Perhaps they want to give the TAM information about which index to use for sorting, on those occasions when the TAM's logic dictates that sorting is appropriate, but not in response to a cluster command.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#7)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 7:21 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Perhaps they want to give the TAM information about which index to use for sorting, on those occasions when the TAM's logic dictates that sorting is appropriate, but not in response to a cluster command.

I should admit that this is a bit hack-ish, but TAM authors haven't been left a lot of options here. Index AMs allow for custom storage parameters, but Table AMs don't, so getting information to the TAM about how to behave takes more than a little slight of hand. Simon's proposal from a while back (don't have the link just now) to allow TAMs to define custom storage parameters would go some distance here.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#7)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 19:21:42 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 7:14 PM, Andres Freund <andres@anarazel.de> wrote:
On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
is pre-clustered

VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
is shared, VACUUM FULL doesn't order the table contents. I see now reason why
an AM shouldn't support VACUUM FULL?

It's effectively a synonym which determines whether the "bool use_sort"
parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM
plays along and sorts or not based on that.

Hardly a synonym if it behaves differently?

But it's up to the TAM what it wants to do with that boolean, if in fact it
does anything at all based on that. A TAM could decide to do the exact
opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.

From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
synonyms. Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?

, and both will run against the table if the user has run an ALTER
TABLE..CLUSTER ON.

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Perhaps they want to give the TAM information about which index to use for
sorting, on those occasions when the TAM's logic dictates that sorting is
appropriate, but not in response to a cluster command.

I have little sympathy to randomly misusing catalog contents and then
complaining that those catalog contents have an effect.

Greetings,

Andres Freund

#10Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#9)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:

It's effectively a synonym which determines whether the "bool use_sort"
parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM
plays along and sorts or not based on that.

Hardly a synonym if it behaves differently?

I don't think this point is really worth arguing. We don't have to call it a synonym, and the rest of the discussion remains the same.

But it's up to the TAM what it wants to do with that boolean, if in fact it
does anything at all based on that. A TAM could decide to do the exact
opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.

I don't think it's "stupid stuff". A major motivation, perhaps the only useful motivation, for implementing a TAM is to get a non-trivial performance boost (relative to heap) for some target workload, almost certainly at the expense of worse performance for another workload. To optimize any particular workload sufficiently to make it worth the bother, you've pretty much got to do something meaningfully different than what heap does.

From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
synonyms. Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might choose to implement, but I don't see why that should be required of all TAMs anybody might ever implement.

The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the

/*
* Open the relations we need.
*/
NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function, and unless I raise an error, after returning from my TAM's callback, the cluster code will replace the old table with the new one. I'm left no choices but to copy my data over, loose my data, or abort the command. None of those are OK options for me.

I'm open to different solutions. If a simple callback like relation_supports_cluster(Relation rel) is too simplistic, and more parameters with more context information is wanted, then fine, let's do that. But I can't really complete my work with the interface as it stands now.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#10)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:

But it's up to the TAM what it wants to do with that boolean, if in fact it
does anything at all based on that. A TAM could decide to do the exact
opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.

I don't think it's "stupid stuff". A major motivation, perhaps the only
useful motivation, for implementing a TAM is to get a non-trivial
performance boost (relative to heap) for some target workload, almost
certainly at the expense of worse performance for another workload. To
optimize any particular workload sufficiently to make it worth the bother,
you've pretty much got to do something meaningfully different than what heap
does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.

From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
synonyms. Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might
choose to implement, but I don't see why that should be required of all TAMs
anybody might ever implement.

The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the

/*
* Open the relations we need.
*/
NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function,
and unless I raise an error, after returning from my TAM's callback, the
cluster code will replace the old table with the new one. I'm left no
choices but to copy my data over, loose my data, or abort the command. None
of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.

I'm open to different solutions. If a simple callback like
relation_supports_cluster(Relation rel) is too simplistic, and more
parameters with more context information is wanted, then fine, let's do
that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.

But I can't really complete my work with the interface as it stands
now.

Since you've not described that work to a meaningful degree...

Greetings,

Andres Freund

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Andres Freund (#11)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:

If a simple callback like
relation_supports_cluster(Relation rel) is too simplistic

Seems like it should be called:
relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Basically, if the user tells the table to make itself smaller on disk by
removing dead tuples, should we support the case where the Table AM says:
"Sorry, I cannot do that"?

If yes, then naming the table explicitly should elicit an error. Having
the table chosen implicitly should provoke a warning. For ALTER TABLE
CLUSTER there should be an error: which makes the implicit CLUSTER command
a non-factor.

However, given that should the table structure change it is imperative that
the Table AM be capable of producing a new physical relation with the
correct data, which will have been compacted as a side-effect, it seems
like, explicit or implicit, expecting any Table AM to do that when faced
with Vacuum Full is reasonable. Which leaves deciding how to allow a table
with a given TAM to prevent itself from being added to the CLUSTER roster.
And decide whether an opt-out feature for implicit VACUUM FULL is something
we should offer as well.

I'm doubtful that a TAM that is pluggable into the MVCC and WAL
architecture of PostgreSQL could avoid this basic contract between the
system and its users.

David J.

#13Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#11)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 8:18 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:

On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:

But it's up to the TAM what it wants to do with that boolean, if in fact it
does anything at all based on that. A TAM could decide to do the exact
opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.

I don't think it's "stupid stuff". A major motivation, perhaps the only
useful motivation, for implementing a TAM is to get a non-trivial
performance boost (relative to heap) for some target workload, almost
certainly at the expense of worse performance for another workload. To
optimize any particular workload sufficiently to make it worth the bother,
you've pretty much got to do something meaningfully different than what heap
does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.

The problem isn't that the TAM can't do it. Any TAM can probably copy its contents verbatim from the OldHeap to the NewHeap. But it's really stupid to have to do that if you're not going to change anything along the way, and I think if you divorce your thinking from how heap-AM works sufficiently, you might start to see how other TAMs might have nothing useful to do at this step. So there's a strong motivation to not be forced into a "move all the data uselessly" step.

I don't really want to discuss the proprietary details of any TAMs I'm developing, so I'll use some made up examples. Imagine you have decided to trade off the need to vacuum against the cost of storing 64bit xids. That doesn't mean that compaction isn't maybe still a good thing, but you don't need to vacuum for anti-wraparound purposes anymore. Now imagine you've also decided to trade off insert speed for select speed, and you sort the entire table on every insert, and to keep indexes happy, you keep a "externally visible TID" to "internal actual location" mapping in a separate fork. Let's say you also don't support UPDATE or DELETE at all. Those immediately draw an error, "not implemented by this tam".

At this point, you have a fully sorted and completely compacted table at all times. It's simply an invariant of the TAM. So, now what exactly is VACUUM FULL or CLUSTER supposed to do? Seems like the answer is "diddly squat", and yet you seem to propose requiring the TAM to do it. I don't like that.

From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
synonyms. Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might
choose to implement, but I don't see why that should be required of all TAMs
anybody might ever implement.

The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the

/*
* Open the relations we need.
*/
NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function,
and unless I raise an error, after returning from my TAM's callback, the
cluster code will replace the old table with the new one. I'm left no
choices but to copy my data over, loose my data, or abort the command. None
of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.

I'm trying to opt out of cluster/vacfull.

I'm open to different solutions. If a simple callback like
relation_supports_cluster(Relation rel) is too simplistic, and more
parameters with more context information is wanted, then fine, let's do
that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.

I'm not entirely against you on that, but it makes me cringe that we impose design decisions like that on any and all future TAMs. It seems better to me to let the TAM author decide to emit an error, warning, notice, or whatever, as they see fit.

But I can't really complete my work with the interface as it stands
now.

Since you've not described that work to a meaningful degree...

I don't think I should have to do so. It's like saying, "I think I should have freedom of speech", and you say, "well, I'm not sure about that; tell me what you want to say, and I'll decide if I'm going to let you say it". That's not freedom. I think TAM authors should have broad discretion over anything that the core system doesn't have a compelling interest in controlling. You've not yet said why a TAM should be prohibited from opting out of cluster/vacfull.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Mark Dilger
mark.dilger@enterprisedb.com
In reply to: David G. Johnston (#12)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 15, 2022, at 8:50 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:

If a simple callback like
relation_supports_cluster(Relation rel) is too simplistic

Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the case where the Table AM says: "Sorry, I cannot do that"?

I submit that's the only sane thing to do if the table AM already guarantees that the table will always be fully compacted. There is no justification for forcing the table contents to be copied without benefit.

If yes, then naming the table explicitly should elicit an error. Having the table chosen implicitly should provoke a warning. For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor.

I'm basically fine with how you would design the TAM, but I'm going to argue again that the core project should not dictate these decisions. The TAM's relation_supports_compaction() function can return true/false, or raise an error. If raising an error is the right action, the TAM can do that. If the core code makes that decision, the TAM can't override, and that paints TAM authors into a corner.

However, given that should the table structure change it is imperative that the Table AM be capable of producing a new physical relation with the correct data, which will have been compacted as a side-effect, it seems like, explicit or implicit, expecting any Table AM to do that when faced with Vacuum Full is reasonable. Which leaves deciding how to allow a table with a given TAM to prevent itself from being added to the CLUSTER roster. And decide whether an opt-out feature for implicit VACUUM FULL is something we should offer as well.

I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture of PostgreSQL could avoid this basic contract between the system and its users.

How about a TAM that implements a write-once, read-many logic. You get one multi-insert, and forever after you can't modify it (other than to drop the table, or perhaps to truncate it). That's a completely made-up-on-the-spot example, but it's not entirely without merit. You could avoid a lot of locking overhead when using such a table, since you'd know a priori that nobody else is modifying it. It could also be implemented with a smaller tuple header, since a lot of the header bytes in heap tuples are dedicated to tracking updates. You wouldn't need a per-row inserting transaction-Id either, since you could just store one per table, knowing that all the rows were inserted in the same transaction.

In what sense does this made-up TAM conflict with mvcc and wal? It doesn't have all the features of heap, but that's not the same thing as violating mvcc or breaking wal.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#13)
Re: Modest proposal to extend TableAM API for controlling cluster commands

Hi,

On 2022-06-15 22:23:36 -0700, Mark Dilger wrote:

I'm not entirely against you on that, but it makes me cringe that we impose
design decisions like that on any and all future TAMs. It seems better to
me to let the TAM author decide to emit an error, warning, notice, or
whatever, as they see fit.

The tradeoff is that that pushes down complexity and makes the overall system
harder to understand. I'm not saying that there's no possible use for such
callbacks / configurability, I'm just not convinced it's worth the cost.

But I can't really complete my work with the interface as it stands
now.

Since you've not described that work to a meaningful degree...

I don't think I should have to do so. It's like saying, "I think I should
have freedom of speech", and you say, "well, I'm not sure about that; tell
me what you want to say, and I'll decide if I'm going to let you say it".'
That's not freedom. I think TAM authors should have broad discretion over
anything that the core system doesn't have a compelling interest in
controlling.

That's insultingly ridiculous. You can say, do whatever you want, but that
doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
compelled speech, to go with your image...

It's utterly normal to be asked what the use case for a new API is when
proposing one.

You've not yet said why a TAM should be prohibited from opting
out of cluster/vacfull.

API / behavioural complexity. If we make ever nook and cranny configurable,
we'll have an ever harder to use / administer system (from a user's POV) and
have difficulty understanding the state of the system when writing patches
(from a core PG developer's POV). It might be the right thing in this case -
hence me asking for what the motivation is.

Greetings,

Andres Freund

#16David G. Johnston
david.g.johnston@gmail.com
In reply to: Mark Dilger (#14)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Wed, Jun 15, 2022 at 11:23 PM Mark Dilger <mark.dilger@enterprisedb.com>
wrote:

On Jun 15, 2022, at 8:50 PM, David G. Johnston <

david.g.johnston@gmail.com> wrote:

On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de>

wrote:

If a simple callback like
relation_supports_cluster(Relation rel) is too simplistic

Seems like it should be called:

relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

Basically, if the user tells the table to make itself smaller on disk by

removing dead tuples, should we support the case where the Table AM says:
"Sorry, I cannot do that"?

I submit that's the only sane thing to do if the table AM already
guarantees that the table will always be fully compacted. There is no
justification for forcing the table contents to be copied without benefit.

I accept that this is a valid outcome that should be accommodated for.

If yes, then naming the table explicitly should elicit an error. Having

the table chosen implicitly should provoke a warning. For ALTER TABLE
CLUSTER there should be an error: which makes the implicit CLUSTER command
a non-factor.

I'm basically fine with how you would design the TAM, but I'm going to
argue again that the core project should not dictate these decisions. The
TAM's relation_supports_compaction() function can return true/false, or
raise an error. If raising an error is the right action, the TAM can do
that.

If the core code makes that decision, the TAM can't override, and that
paints TAM authors into a corner.

The core code has to decide what the template pattern code looks like,
including what things it will provide and what it requires extensions to
provide. To a large extent, providing a consistent end-user experience is
the template's, and thus core code's, job.

How about a TAM that implements a write-once, read-many logic. You get
one multi-insert, and forever after you can't modify it (other than to drop
the table, or perhaps to truncate it).

So now the AM wants to ignore ALTER TABLE, INSERT, and DELETE commands.

That's a completely made-up-on-the-spot example, but it's not entirely

without merit.

In what sense does this made-up TAM conflict with mvcc and wal? It
doesn't have all the features of heap, but that's not the same thing as
violating mvcc or breaking wal.

I am nowhere near informed enough to speak to the implementation details
here, and my imagination is probably lacking too, but I'll accept that the
current system does indeed make assumptions in the template design that are
now being seen as incorrect in light of new algorithms.

But you are basically proposing a reworking of the existing system into one
that makes pretty much any SQL Command something that a TAM can treat as
being an optional request by the user; whereas today the system presumes
that the implementations will respond to these commands. And to make this
change without any core code having such a need. Or even a working
extension that can be incorporated during development. And, as per the
above, all of this requires coming to some kind of agreement on the desired
user experience (I don't immediately accept the "let the AM decide" option).

Anyway, that was mostly my attempt at Devil's Advocate.
I was going to originally post that the template simply inspect whether the
new physical relation file, after the copy was requested, had a non-zero
size, and if so finish performing the swap the way we do today, otherwise
basically abort (or otherwise perform the minimal amount of catalog
changes) so the existing relation file continues to be pointed at.
Something to consider with a smaller API footprint than a gatekeeper hook.

I think that all boils down to - it seems preferable to simply continue
assuming all these commands are accepted, but figure out whether a "no-op"
is a valid outcome and, if so, ensure there is a way to identify that no-op
meaningfully. While hopefully designing the surrounding code so that
unnecessary work is not performed in front of a no-op. This seems
preferable to spreading hooks throughout the code that basically ask "do
you handle this SQL command?". The specifics of the existing code may
dictate otherwise.

David J.

#17Mark Dilger
mark.dilger@enterprisedb.com
In reply to: David G. Johnston (#16)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 16, 2022, at 12:28 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

But you are basically proposing a reworking of the existing system into one that makes pretty much any SQL Command something that a TAM can treat as being an optional request by the user;

Yes, and I think I'm perfectly correct in asking for that. If the standard you are proposing (albeit as Devil's Advocate) were applied to filesystems, nobody could ever implement /dev/null, on the argument that users have a reasonable expectation that data they write to a file will be there for them to read later. Yet Michael Paquier wrote a blackhole TAM, and although I don't find it terribly useful, I do think it's a reasonable thing for somebody to be able to write.

whereas today the system presumes that the implementations will respond to these commands.

That depends on what you mean by "respond to". A TAM which implements a tamper resistant audit log responds to update and delete commands with a "permission denied" error. A TAM which implements running aggregates implements insert commands by computing and inserting a new running aggregate value and reclaiming space from old running aggregate values when no transaction could any longer see them. You can do this stuff at a higher level with hooks, functions, triggers, and rules, inserting into a heap, and having to periodically vacuum, by why would you want to? That's almost guaranteed to be slower, maybe even orders of magnitude slower.

And to make this change without any core code having such a need.

The core code won't have any such need, because the core code is content with heap, and the API already accommodates heap. It seems Andres moved the project in the direction of allowing custom TAMs when he created the Table AM interface, and I'm quite pleased that he did so, but it doesn't allow nearly enough flexibility to do all the interesting things a TAM could otherwise do. Consider for example that the multi_insert hook uses a BulkInsertStateData parameter, defined as:

typedef struct BulkInsertStateData
{
BufferAccessStrategy strategy; /* our BULKWRITE strategy object */
Buffer current_buf; /* current insertion target page */
} BulkInsertStateData;

which is just the structure heap would want, but what about a TAM that wants to route different tuples to different pages? The "current_buf" isn't enough information, and there's no void *extra field, so you're just sunk.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#15)
Re: Modest proposal to extend TableAM API for controlling cluster commands

On Jun 16, 2022, at 12:27 AM, Andres Freund <andres@anarazel.de> wrote:

I don't think I should have to do so. It's like saying, "I think I should
have freedom of speech", and you say, "well, I'm not sure about that; tell
me what you want to say, and I'll decide if I'm going to let you say it".'
That's not freedom. I think TAM authors should have broad discretion over
anything that the core system doesn't have a compelling interest in
controlling.

That's insultingly ridiculous. You can say, do whatever you want, but that
doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
compelled speech, to go with your image...

Indeed it would be compelled speech, and I'm not trying to compel you, only to convince you. And my apologies if it came across as insulting. I have a lot of respect for you, as do others at EDB, per invariably complementary comments I've heard others express.

It's utterly normal to be asked what the use case for a new API is when
proposing one.

It seems like we're talking on two different levels. I've said what the use case is, which is to implement a TAM that doesn't benefit from cluster or vacuum full, without the overhead of needlessly copying itself, and without causing argumentless VACUUM FULL commands to fail. I'm *emphatically* not asking the community to accept the TAM back as a patch. The freedom I'm talking about is the freedom to design and implement such a third-party TAM without seeking community approval of the TAM's merits.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company