alter table set TABLE ACCESS METHOD
I thought this was a good idea, but didn't hear back when I raised it before.
I was motivated to finally look into it by Dilip's toast compression patch,
which also (can do) a table rewrite when changing a column's toast compression.
I called this "set TABLE access method" rather than just "set access method"
for the reasons given on the LIKE thread:
/messages/by-id/20210119210331.GN8560@telsasoft.com
I've tested this with zedstore, but the lack of 2nd, in-core table AM limits
testing possibilities. It also limits at least my own ability to reason about
the AM API. For example, I was surprised to hear that toast is a concept
that's intended to be applied to AMs other than heap.
/messages/by-id/CA+TgmoYTuT4sRtviMLOOO+79VnDCpCNyy9rK6UZFb7KEAVt21w@mail.gmail.com
I plan to add to CF - it seems like a minor addition, but may not be for v14.
/messages/by-id/20190818193533.GL11185@telsasoft.com
On Sun, Aug 18, 2019 at 02:35:33PM -0500, Justin Pryzby wrote:
. What do you think about pg_restore --no-tableam; similar to
--no-tablespaces, it would allow restoring a table to a different AM:
PGOPTIONS='-c default_table_access_method=zedstore' pg_restore --no-tableam ./pg_dump.dat -d postgres
Otherwise, the dump says "SET default_table_access_method=heap", which
overrides any value from PGOPTIONS and precludes restoring to new AM.
...
Show quoted text
. it'd be nice if there was an ALTER TABLE SET ACCESS METHOD, to allow
migrating data. Otherwise I think the alternative is:
begin; lock t;
CREATE TABLE new_t LIKE (t INCLUDING ALL) USING (zedstore);
INSERT INTO new_t SELECT * FROM t;
for index; do CREATE INDEX...; done
DROP t; RENAME new_t (and all its indices). attach/inherit, etc.
commit;. Speaking of which, I think LIKE needs a new option for ACCESS METHOD, which
is otherwise lost.
Attachments:
0001-ALTER-TABLE-SET-ACCESS-METHOD.patchtext/x-diff; charset=us-asciiDownload+119-16
On Sun, Feb 28, 2021 at 04:25:30PM -0600, Justin Pryzby wrote:
I called this "set TABLE access method" rather than just "set access method"
for the reasons given on the LIKE thread:
/messages/by-id/20210119210331.GN8560@telsasoft.com
ALTER TABLE applies to a table (or perhaps a sequence, still..), so
that sounds a bit weird to me to add again the keyword "TABLE" for
that.
I've tested this with zedstore, but the lack of 2nd, in-core table AM limits
testing possibilities. It also limits at least my own ability to reason about
the AM API.For example, I was surprised to hear that toast is a concept that's
intended to be applied to AMs other than heap.
/messages/by-id/CA+TgmoYTuT4sRtviMLOOO+79VnDCpCNyy9rK6UZFb7KEAVt21w@mail.gmail.com
What kind of advanced testing do you have in mind? It sounds pretty
much enough to me for a basic patch to use the trick with heap2 as
your patch does. That would be enough to be sure that the rewrite
happens and that data is still around. If you are worrying about
recovery, a TAP test with an immediate stop of the server could
equally be used to check after the FPWs produced for the new
relfilenode during the rewrite.
--
Michael
On Mon, Mar 01, 2021 at 11:16:36AM +0900, Michael Paquier wrote:
On Sun, Feb 28, 2021 at 04:25:30PM -0600, Justin Pryzby wrote:
I called this "set TABLE access method" rather than just "set access method"
for the reasons given on the LIKE thread:
/messages/by-id/20210119210331.GN8560@telsasoft.comALTER TABLE applies to a table (or perhaps a sequence, still..), so
that sounds a bit weird to me to add again the keyword "TABLE" for
that.
I don't know if you're following along the toast compression patch -
Alvaro had suggested that instead of making a new catalog just for a handful of
tuples for compression types, to instead store them in pg_am, with a new
am_type='c'. So I proposed a patch for
| CREATE TABLE .. (LIKE other INCLUDING ACCESS METHOD),
but then decided that it should say INCLUDING *TABLE* ACCESS METHOD, since
otherwise it was somewhat strange that it didn't include the compression access
methods (which have a separate LIKE option).
I've tested this with zedstore, but the lack of 2nd, in-core table AM limits
testing possibilities. It also limits at least my own ability to reason about
the AM API.For example, I was surprised to hear that toast is a concept that's
intended to be applied to AMs other than heap.
/messages/by-id/CA+TgmoYTuT4sRtviMLOOO+79VnDCpCNyy9rK6UZFb7KEAVt21w@mail.gmail.comWhat kind of advanced testing do you have in mind? It sounds pretty
much enough to me for a basic patch to use the trick with heap2 as
your patch does. That would be enough to be sure that the rewrite
happens and that data is still around.
The issue is that the toast data can be compressed, so it needs to be detoasted
before pushing it to the other AM, which otherwise may not know how to
decompress it.
If it's not detoasted, this works with "COMPRESSION lz4" (since zedstore
happens to know how to decompress it) but that's just an accident, and it fails
with when using pglz. That's got to do with 2 non-core patches - when core has
only heap, then I don't see how something like this can be exercized.
postgres=# DROP TABLE t; CREATE TABLE t (a TEXT COMPRESSION pglz) USING heap; INSERT INTO t SELECT repeat(a::text,9999) FROM generate_series(1,99)a; ALTER TABLE t SET ACCESS METHOD zedstore; SELECT * FROM t;
DROP TABLE
CREATE TABLE
INSERT 0 99
ALTER TABLE
2021-02-28 20:50:42.653 CST client backend[14958] psql ERROR: compressed lz4 data is corrupt
2021-02-28 20:50:42.653 CST client backend[14958] psql STATEMENT: SELECT * FROM t;
--
Justin
On Mon, Mar 01, 2021 at 11:16:36AM +0900, Michael Paquier wrote:
On Sun, Feb 28, 2021 at 04:25:30PM -0600, Justin Pryzby wrote:
I called this "set TABLE access method" rather than just "set access method"
for the reasons given on the LIKE thread:
/messages/by-id/20210119210331.GN8560@telsasoft.comALTER TABLE applies to a table (or perhaps a sequence, still..), so
that sounds a bit weird to me to add again the keyword "TABLE" for
that.
This renames to use SET ACCESS METHOD (resolving a silly typo);
And handles the tuple slots more directly;
And adds docs and tab completion;
Also, since 8586bf7ed8889f39a59dd99b292014b73be85342:
| For now it's not allowed to set a table AM for a partitioned table, as
| we've not resolved how partitions would inherit that. Disallowing
| allows us to introduce, if we decide that's the way forward, such a
| behaviour without a compatibility break.
I propose that it should behave like tablespace for partitioned rels:
ca4103025dfe, 33e6c34c3267
--
Justin
On Sun, Mar 07, 2021 at 07:07:07PM -0600, Justin Pryzby wrote:
This renames to use SET ACCESS METHOD (resolving a silly typo);
And handles the tuple slots more directly;
And adds docs and tab completion;Also, since 8586bf7ed8889f39a59dd99b292014b73be85342:
| For now it's not allowed to set a table AM for a partitioned table, as
| we've not resolved how partitions would inherit that. Disallowing
| allows us to introduce, if we decide that's the way forward, such a
| behaviour without a compatibility break.I propose that it should behave like tablespace for partitioned rels:
ca4103025dfe, 33e6c34c3267
Sounds sensible from here. Based on the patch at hand, changing the
AM of a partitioned table does nothing for the existing partitions,
and newly-created partitions would inherit the new AM assigned to its
parent. pg_dump is handling things right.
From what I can see, the patch in itself is simple, with central parts
in swap_relation_files() to handle the rewrite and make_new_heap() to
assign the new correct AM.
+ * Since these have no storage the tablespace can be updated with a simple
+ * metadata only operation to update the tablespace.
+ */
+static void
+ATExecSetAccessMethodNoStorage(Relation rel, Oid newAccessMethod
This comment still refers to tablespaces.
+ /*
+ * Record dependency on AM. This is only required for relations
+ * that have no physical storage.
+ */
+ changeDependencyFor(RelationRelationId, RelationGetRelid(rel),
+ AccessMethodRelationId, oldrelam,
+ newAccessMethod);
And? Relations with storage also require this dependency.
- if (tab->newTableSpace)
+ if (OidIsValid(tab->newTableSpace))
You are right, but this is just a noise diff in this patch.
+ swaptemp = relform1->relam;
+ relform1->relam = relform2->relam;
+ relform2->relam = swaptemp;
swap_relation_files() holds the central logic, but I can see that no
comments of this routine have been updated (header, comment when
looking at valid relfilenode{1,2}).
It seems to me that 0002 and 0003 should just be merged together.
+ /* Need to detoast tuples when changing AM XXX: should
check if one AM is heap and one isn't? */
+ if (newrel->rd_rel->relam != oldrel->rd_rel->relam)
+ {
+ HeapTuple htup = toast_build_flattened_tuple(oldTupDesc,
+ oldslot->tts_values,
+ oldslot->tts_isnull);
This toast issue is a kind of interesting one, and it seems rather
right to rely on toast_build_flattened_tuple() to decompress things if
both table AMs support toast with the internals of toast knowing what
kind of compression has been applied to the stored tuple, rather than
have the table AM try to extra a toast tuple by itself. I wonder
whether we should have a table AM API to know what kind of compression
is supported for a given table AMs at hand, because there is no need
to flatten things if both the origin and the target match their
compression algos, which would be on HEAD to make sure that both the
origin and table AMs have toast (relation_toast_am). Your patch,
here, would flatten each tuples as long as the table AMs don't
match. That can be made cheaper in some cases.
--
Michael
On Mon, Mar 08, 2021 at 04:30:23PM +0900, Michael Paquier wrote:
This toast issue is a kind of interesting one, and it seems rather
right to rely on toast_build_flattened_tuple() to decompress things if
both table AMs support toast with the internals of toast knowing what
kind of compression has been applied to the stored tuple, rather than
have the table AM try to extra a toast tuple by itself. I wonder
whether we should have a table AM API to know what kind of compression
is supported for a given table AMs at hand, because there is no need
to flatten things if both the origin and the target match their
compression algos, which would be on HEAD to make sure that both the
origin and table AMs have toast (relation_toast_am). Your patch,
here, would flatten each tuples as long as the table AMs don't
match. That can be made cheaper in some cases.
I actually have an idea for this one, able to test the decompression
-> insert path when rewriting a relation with a new AM: we could add a
dummy_table_am in src/test/modules/. By design, this table AM acts as
a blackhole, eating any data we insert into it but print into the logs
the data candidate for INSERT. When doing a heap -> dummy_table_am
rewrite, the logs would provide coverage with the data from the origin
toast table. The opposite operation does not really matter, though it
could be tested. In one of my trees, I have something already close
to that:
https://github.com/michaelpq/pg_plugins/tree/master/blackhole_am/
--
Michael
On Mon, 2021-03-08 at 16:30 +0900, Michael Paquier wrote:
This toast issue is a kind of interesting one, and it seems rather
right to rely on toast_build_flattened_tuple() to decompress things
if
both table AMs support toast with the internals of toast knowing what
kind of compression has been applied to the stored tuple, rather than
have the table AM try to extra a toast tuple by itself. I wonder
whether we should have a table AM API to know what kind of
compression
is supported for a given table AMs at hand, because there is no need
to flatten things if both the origin and the target match their
compression algos, which would be on HEAD to make sure that both the
origin and table AMs have toast (relation_toast_am). Your patch,
here, would flatten each tuples as long as the table AMs don't
match. That can be made cheaper in some cases.
I am confused by this. The toast-related table AM API functions are:
relation_needs_toast_table(), relation_toast_am(), and
relation_fetch_toast_slice().
What cases are we trying to solve here?
1. target of conversion is tableam1 main table, heap toast table
2. target of conversion is tableam1 main table, no toast table
3. target of conversion is tableam1 main table, tableam1 toast table
4. target of conversion is tableam1 main table, tableam2 toast table
Or does the problem apply to all of these cases?
And if tableam1 can't handle some case, why can't it just detoast the
data itself? Shouldn't that be able to decompress anything?
For example, in columnar[1], we just always detoast/decompress because
we want to compress it ourselves (along with other values from the same
column), and we never have a separate toast table. Is that code
incorrect, or will it break in v14?
Regards,
Jeff Davis
On Thu, May 06, 2021 at 01:10:53PM -0700, Jeff Davis wrote:
On Mon, 2021-03-08 at 16:30 +0900, Michael Paquier wrote:
This toast issue is a kind of interesting one, and it seems rather
right to rely on toast_build_flattened_tuple() to decompress things
if
both table AMs support toast with the internals of toast knowing what
kind of compression has been applied to the stored tuple, rather than
have the table AM try to extra a toast tuple by itself. I wonder
whether we should have a table AM API to know what kind of
compression
is supported for a given table AMs at hand, because there is no need
to flatten things if both the origin and the target match their
compression algos, which would be on HEAD to make sure that both the
origin and table AMs have toast (relation_toast_am). Your patch,
here, would flatten each tuples as long as the table AMs don't
match. That can be made cheaper in some cases.I am confused by this. The toast-related table AM API functions are:
relation_needs_toast_table(), relation_toast_am(), and
relation_fetch_toast_slice().
I wrote this shortly after looking at one of Dilip's LZ4 patches.
At one point in February/March, pg_attribute.attcompression defined the
compression used by *all* tuples in the table, rather than the compression used
for new tuples, and ALTER SET COMPRESSION would rewrite the table with the new
compression (rather than being only a catalog update).
What cases are we trying to solve here?
1. target of conversion is tableam1 main table, heap toast table
2. target of conversion is tableam1 main table, no toast table
3. target of conversion is tableam1 main table, tableam1 toast table
4. target of conversion is tableam1 main table, tableam2 toast table
I think ALTER TABLE SET ACCESS METHOD should move all data off the old AM,
including its toast table. The optimization Michael suggests is when the new
AM and old AM use the same toast AM, then the data doesn't need to be
de/re/toasted.
Thanks for looking.
--
Justin
On Thu, 2021-05-06 at 15:23 -0500, Justin Pryzby wrote:
I think ALTER TABLE SET ACCESS METHOD should move all data off the
old AM,
including its toast table.
Can you explain what you mean, and why? I'm still confused.
Let's say there are 4 table AMs: A, AT, B, and BT. A's
relation_toast_am() returns AT, and B's relation_toast_am() returns BT.
AT or BT are invalid if A or B have relation_needs_toast_table() return
false.
Here are the cases that I see:
If A = B, then AT = BT, and it's all a no-op.
If A != B and BT is invalid (e.g. converting heap to columnar), then A
should detoast (and perhaps decompress, as in the case of columnar)
whatever it gets as input and do whatever it wants. That's what
columnar does and I don't see why ATRewriteTable needs to handle it.
If A != B and AT != BT, then B needs to detoast whatever it gets (but
should not decompress, as that would just be wasted effort), and then
re-toast using BT. Again, I don't see a need for ATRewriteTable to do
anything, B can handle it.
The only case I can really see where ATRewriteTable might be helpful is
if A != B but AT = BT. In that case, in theory, you don't need to do
anything to the toast table, just leave it where it is. But then the
responsibilities get a little confusing to me -- how is B supposed to
know that it doesn't need to toast anything? Is this the problem you
are trying to solve?
Regards,
Jeff Davis
On Thu, May 06, 2021 at 02:11:31PM -0700, Jeff Davis wrote:
On Thu, 2021-05-06 at 15:23 -0500, Justin Pryzby wrote:
I think ALTER TABLE SET ACCESS METHOD should move all data off the
old AM,
including its toast table.Can you explain what you mean, and why? I'm still confused.
Let's say there are 4 table AMs: A, AT, B, and BT. A's
relation_toast_am() returns AT, and B's relation_toast_am() returns BT.
AT or BT are invalid if A or B have relation_needs_toast_table() return
false.Here are the cases that I see:
If A = B, then AT = BT, and it's all a no-op.
If A != B and BT is invalid (e.g. converting heap to columnar), then A
should detoast (and perhaps decompress, as in the case of columnar)
whatever it gets as input and do whatever it wants. That's what
columnar does and I don't see why ATRewriteTable needs to handle it.If A != B and AT != BT, then B needs to detoast whatever it gets (but
should not decompress, as that would just be wasted effort), and then
re-toast using BT. Again, I don't see a need for ATRewriteTable to do
anything, B can handle it.The only case I can really see where ATRewriteTable might be helpful is
if A != B but AT = BT. In that case, in theory, you don't need to do
anything to the toast table, just leave it where it is. But then the
responsibilities get a little confusing to me -- how is B supposed to
know that it doesn't need to toast anything? Is this the problem you
are trying to solve?
That's the optimization Michael is suggesting.
I was approaching this after having just looked at Dilip's patch (which was
originally written using pg_am to support "pluggable" compression "AM"s, but
not otherwise related to table AM).
Once a table is converted to a new AM, its tuples had better not reference the
old AM - it could be dropped.
The new table AM (B) shouldn't need to know anything about the old one (A). It
should just process incoming tuples. It makes more to me that ATRewriteTable
would handle this, rather than every acccess method having the same logic (at
best) or different logic (more likely). If ATRewriteTable didn't handle this,
data would become inaccessible if an AM failed to de-toast tuples.
--
Justin
On Thu, 2021-05-06 at 17:19 -0500, Justin Pryzby wrote:
If ATRewriteTable didn't
handle this,
data would become inaccessible if an AM failed to de-toast tuples.
If the AM fails to detoast tuples, it's got bigger problems than ALTER
TABLE. What about INSERT INTO ... SELECT?
It's the table AM's responsibility to detoast out-of-line datums and
toast any values that are too large (see
heapam.c:heap_prepare_insert()).
Regards,
Jeff Davis
On Thu, 2021-05-06 at 17:24 -0700, Jeff Davis wrote:
It's the table AM's responsibility to detoast out-of-line datums and
toast any values that are too large (see
heapam.c:heap_prepare_insert()).
Do we have general agreement on this point? Did I miss another purpose
of detoasting in tablecmds.c, or can we just remove that part of the
patch?
Regards,
Jeff Davis
On Thu, Jun 03, 2021 at 02:36:15PM -0700, Jeff Davis wrote:
Do we have general agreement on this point? Did I miss another purpose
of detoasting in tablecmds.c, or can we just remove that part of the
patch?
Catching up with this thread.. So, what you are suggesting here is
that we have no need to let ATRewriteTable() do anything about the
detoasting, and just push down the responsability of detoasting the
tuple, if necessary, down to the AM layer where the tuple insertion is
handled, right?
In short, a table AMs would receive on a rewrite with ALTER TABLE
tuples which may be toasted, still table_insert_tuple() should be able
to handle both:
- the case where this tuple was already toasted.
- the case where this tuple has been already detoasted.
You are right that this would be more consistent with what heap does
with heap_prepare_insert().
--
Michael
On Fri, 2021-06-04 at 14:58 +0900, Michael Paquier wrote:
In short, a table AMs would receive on a rewrite with ALTER TABLE
tuples which may be toasted, still table_insert_tuple() should be
able
to handle both:
- the case where this tuple was already toasted.
- the case where this tuple has been already detoasted.
Yes. That's a current requirement, and any AM that doesn't do that is
already broken (e.g. for INSERT INTO ... SELECT).
Regards,
Jeff Davis
On Fri, Jun 04, 2021 at 11:26:28AM -0700, Jeff Davis wrote:
Yes. That's a current requirement, and any AM that doesn't do that is
already broken (e.g. for INSERT INTO ... SELECT).
Makes sense. I was just looking at the patch, and this was the only
part of it that made my spidey sense react.
One thing I am wondering is if we should have a dummy_table_am in
src/test/modules/ to be able to stress more this feature. That does
not seem like a hard requirement, but relying only on heap limits a
bit the coverage of this feature even if one changes
default_table_access_method.
--
Michael
On Sat, 2021-06-05 at 08:45 +0900, Michael Paquier wrote:
One thing I am wondering is if we should have a dummy_table_am in
src/test/modules/ to be able to stress more this feature. That does
not seem like a hard requirement, but relying only on heap limits a
bit the coverage of this feature even if one changes
default_table_access_method.
I agree that a dummy AM would be good, but implementing even a dummy AM
is a fair amount of work. Also, there are many potential variations, so
we'd probably need several.
The table AM API is a work in progress, and I think it will be a few
releases (and require a few more table AMs in the wild) to really nail
down the API.
Regards,
Jeff Davis
On Fri, Jun 04, 2021 at 05:34:36PM -0700, Jeff Davis wrote:
I agree that a dummy AM would be good, but implementing even a dummy AM
is a fair amount of work.
Not much, honestly, the largest part being to document that properly
so as it could be used as a template:
/messages/by-id/YEXm2nh/5j5P2jEl@paquier.xyz
Also, there are many potential variations, so
we'd probably need several.
Not so sure here. GUCs or reloptions could be used to control some of
the behaviors. Now this really depends on the use-cases we are
looking to support here and the low-level facilities that could
benefit from this module (dummy_index_am tests reloptions for
example). I agree that this thread is perhaps not enough to justify
adding this module for now.
The table AM API is a work in progress, and I think it will be a few
releases (and require a few more table AMs in the wild) to really nail
down the API.
Hard to say, we'll see. I'd like to believe that it could be a good
to not set something in stone for that forever.
--
Michael
On Thu, 2021-06-03 at 14:36 -0700, Jeff Davis wrote:
Do we have general agreement on this point? Did I miss another
purpose
of detoasting in tablecmds.c, or can we just remove that part of the
patch?
Per discussion with Justin, I'll drive this patch. I merged the CF
entries into https://commitfest.postgresql.org/33/3110/
New version attached, with the detoasting code removed. Could use
another round of validation/cleanup in case I missed something during
the merge.
Regards,
Jeff Davis
Attachments:
0001-ALTER-TABLE-.-SET-ACCESS-METHOD.patchtext/x-patch; charset=UTF-8; name=0001-ALTER-TABLE-.-SET-ACCESS-METHOD.patchDownload+133-18
On Tue, Jun 08, 2021 at 05:33:31PM -0700, Jeff Davis wrote:
New version attached, with the detoasting code removed. Could use
another round of validation/cleanup in case I missed something during
the merge.
This looks rather sane to me, thanks.
/* Create the transient table that will receive the re-ordered data */
OIDNewHeap = make_new_heap(tableOid, tableSpace,
+ accessMethod
It strikes me that we could extend CLUSTER/VACUUM FULL to support this
option, in the same vein as TABLESPACE. Perhaps that's not something to
implement or have, just wanted to mention it.
+ALTER TABLE heaptable SET ACCESS METHOD heap2;
+explain (analyze, costs off, summary off, timing off) SELECT * FROM heaptable;
+SELECT COUNT(a), COUNT(1) FILTER(WHERE a=1) FROM heaptable;
+DROP TABLE heaptable;
There is a mix of upper and lower-case characters here. It could be
more consistent. It seems to me that this test should actually check
that pg_class.relam reflects the new value.
+ /* Save info for Phase 3 to do the real work */
+ if (OidIsValid(tab->newAccessMethod))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot have multiple SET ACCESS METHOD subcommands")));
Worth adding a test?
- * with the specified persistence, which might differ from the original's.
+ * NewTableSpace/accessMethod/persistence, which might differ from those
Nit: the name of the variable looks inconsistent with this comment.
The original is weird as well.
I am wondering if it would be a good idea to set the new tablespace
and new access method fields to InvalidOid within ATGetQueueEntry. We
do that for the persistence. Not critical at all, still..
+ pass = AT_PASS_MISC;
Maybe add a comment here?
--
Michael
On Wed, 2021-06-09 at 13:47 +0900, Michael Paquier wrote:
There is a mix of upper and lower-case characters here. It could be
more consistent. It seems to me that this test should actually check
that pg_class.relam reflects the new value.
Done. I also added a (negative) test for changing the AM of a
partitioned table, which wasn't caught before.
+ errmsg("cannot have multiple SET ACCESS METHOD
subcommands")));
Worth adding a test?
Done.
Nit: the name of the variable looks inconsistent with this comment.
The original is weird as well.
Tried to improve wording.
I am wondering if it would be a good idea to set the new tablespace
and new access method fields to InvalidOid within
ATGetQueueEntry. We
do that for the persistence. Not critical at all, still..
Done.
+ pass = AT_PASS_MISC;
Maybe add a comment here?
Done. In that case, it doesn't matter because there's no work to be
done in Phase 2.
Regards,
Jeff Davis