Drop type "smgr"?
Hello hackers,
The type smgr has only one value 'magnetic disk'. ~15 years ago it
also had a value 'main memory', and in Berkeley POSTGRES 4.2 there was
a third value 'sony jukebox'. Back then, all tables had an associated
block storage manager, and it was recorded as an attribute relsmgr of
pg_class (or pg_relation as it was known further back). This was the
type of that attribute, removed by Bruce in 3fa2bb31 (1997).
Nothing seems to break if you remove it (except for some tests using
it in an incidental way). See attached.
Motivation: A couple of projects propose to add new smgr
implementations alongside md.c in order to use bufmgr.c for more kinds
of files, but it seems entirely bogus to extend the unused smgr type
to cover those.
--
Thomas Munro
https://enterprisedb.com
Attachments:
0001-Remove-the-vestigial-smgr-type.patchapplication/octet-stream; name=0001-Remove-the-vestigial-smgr-type.patchDownload+16-116
Thomas Munro <thomas.munro@gmail.com> writes:
Motivation: A couple of projects propose to add new smgr
implementations alongside md.c in order to use bufmgr.c for more kinds
of files, but it seems entirely bogus to extend the unused smgr type
to cover those.
I agree that smgrtype as it stands is pretty pointless, but what
will we be using instead to get to those other implementations?
regards, tom lane
On Thu, Feb 28, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Motivation: A couple of projects propose to add new smgr
implementations alongside md.c in order to use bufmgr.c for more kinds
of files, but it seems entirely bogus to extend the unused smgr type
to cover those.I agree that smgrtype as it stands is pretty pointless, but what
will we be using instead to get to those other implementations?
Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementations
(where currently it hard-codes smgr_which = 0). For example there
would be a pseudo-database for undo logs, and another for the SLRUs
that Shawn Debnath and others have been proposing to move into shared
buffers. Another idea would be to widen the buffer tag to include the
selector. Unlike the ancestral code, it wouldn't need to appear in
catalogs or ever be seen or typed in by users so there still wouldn't
be a use for this type.
--
Thomas Munro
https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Feb 28, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I agree that smgrtype as it stands is pretty pointless, but what
will we be using instead to get to those other implementations?
Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementations
Hmm. Maybe mapping based on tablespaces would be a better idea?
Unlike the ancestral code, it wouldn't need to appear in
catalogs or ever be seen or typed in by users so there still wouldn't
be a use for this type.
Yeah, the $64 problem at this level is that you don't really want
to depend on catalog contents, because you cannot do a lookup to
find out what to do. So I agree that we're pretty unlikely to
resurrect an smgr type per se. But I'd been expecting an answer
mentioning pg_am OIDs, and was wondering how that'd work exactly.
Probably, it would still be down to some C code having hard-wired
knowledge about some OIDs ...
regards, tom lane
On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Feb 28, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I agree that smgrtype as it stands is pretty pointless, but what
will we be using instead to get to those other implementations?Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementationsHmm. Maybe mapping based on tablespaces would be a better idea?
In the undo log proposal (about which more soon) we are using
tablespaces for their real purpose, so we need that OID. If you SET
undo_tablespaces = foo then future undo data created by your session
will be written there, which might be useful for putting that IO on
different storage. We also use the relation OID to chop up the undo
address space into separate numbered undo logs, so that different
sessions get their own space to insert data without trampling on each
other's buffer locks. That leaves only the database OID to mess with
(unless we widen buffer tags and the smgropen() interface).
Unlike the ancestral code, it wouldn't need to appear in
catalogs or ever be seen or typed in by users so there still wouldn't
be a use for this type.Yeah, the $64 problem at this level is that you don't really want
to depend on catalog contents, because you cannot do a lookup to
find out what to do.
Right. It would be a circular dependency if you had to read a catalog
before you could consult low level SLRUs like (say) the clog* via
shared buffers, since you need the clog to read the catalog, or (to
peer a long way down the road and around several corners) if the
catalogs themselves could optionally be stored in an undo-aware table
access manager like zheap, which might require reading undo. I think
this block storage stuff exists at a lower level than relations and
transactions and therefore also catalogs, and there will be a small
fixed number of them and it makes sense to hard-code the knowledge of
them.
*I'm at least a little bit aware of the history here: your 2001 commit
2589735d moved clog out of shared buffers! That enabled you to
develop the system of segment files truncated from the front. That's
sort of what this new smgr work is about; putting things in shared
buffers, but just mapping the blocks to paths differently than md.c,
as appropriate.
--
Thomas Munro
https://enterprisedb.com
On Thu, Feb 28, 2019 at 5:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Feb 28, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I agree that smgrtype as it stands is pretty pointless, but what
will we be using instead to get to those other implementations?Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementationsHmm. Maybe mapping based on tablespaces would be a better idea?
Thanks to bringing up this idea of mutliple smgr implementations. I also
thought of implementing our own smgr implementation to support transparent
data encryption on the disk based on tablespace mapping.
Regards,
Haribabu Kommi
Fujitsu Australia
On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The type smgr has only one value 'magnetic disk'. ~15 years ago it
also had a value 'main memory', and in Berkeley POSTGRES 4.2 there was
a third value 'sony jukebox'. Back then, all tables had an associated
block storage manager, and it was recorded as an attribute relsmgr of
pg_class (or pg_relation as it was known further back). This was the
type of that attribute, removed by Bruce in 3fa2bb31 (1997).Nothing seems to break if you remove it (except for some tests using
it in an incidental way). See attached.
FWIW, +1 from me. I thought about arguing to remove this a number of
years ago when I was poking around in this area for some reason, but
it didn't seem important enough to be worth arguing about then. Now,
because we're actually going to maybe-hopefully get some more smgrs
that do interesting things, it seems worth the arguing...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementations
Hmm. Maybe mapping based on tablespaces would be a better idea?
In the undo log proposal (about which more soon) we are using
tablespaces for their real purpose, so we need that OID. If you SET
undo_tablespaces = foo then future undo data created by your session
will be written there, which might be useful for putting that IO on
different storage.
Meh. That's a point, but it doesn't exactly seem like a killer argument.
Just in the abstract, it seems much more likely to me that people would
want per-database special rels than per-tablespace special rels. And
I think your notion of a GUC that can control this is probably pie in
the sky anyway: if we can't afford to look into the catalogs to resolve
names at this code level, how are we going to handle a GUC?
The real reason I'm concerned about this, though, is that for either
a database or a tablespace, you can *not* get away with having a magic
OID just hanging in space with no actual catalog row matching it.
If nothing else, you need an entry there to prevent someone from
reusing the OID for another purpose. And a pg_database row that
doesn't correspond to a real database is going to break all kinds of
code, starting with pg_upgrade and the autovacuum launcher. Special
rows in pg_tablespace are much less likely to cause issues, because
of the precedent of pg_global and pg_default.
In short, I think you're better off equating smgrs to magic
tablespaces, and if you need some way of letting users control
where those map to storage-wise, control it in some other way.
regards, tom lane
On Thu, Feb 28, 2019 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The real reason I'm concerned about this, though, is that for either
a database or a tablespace, you can *not* get away with having a magic
OID just hanging in space with no actual catalog row matching it.
If nothing else, you need an entry there to prevent someone from
reusing the OID for another purpose. And a pg_database row that
doesn't correspond to a real database is going to break all kinds of
code, starting with pg_upgrade and the autovacuum launcher. Special
rows in pg_tablespace are much less likely to cause issues, because
of the precedent of pg_global and pg_default.
My first intuition was the same as yours -- that we should use the
tablespace to decide which smgr is relevant -- but I now think that
intuition was wrong. Even if you use the tablespace OID to select the
smgr, it doesn't completely solve the problem you're worried about
here. You still have to put SOMETHING in the database OID field, and
that's going to be just as fake as it was before. I guess you could
use the OID for pg_global for things like undo and SLRU data, but then
how is GetRelationPath() going to work? You don't have enough bits
left anywhere to specify both a tablespace and a location within that
tablespace in a reasonable way, and I think it's far from obvious how
you would build a side channel that could carry that information.
Also, I don't see why we'd need a fake pg_database row in the first
place. IIUC, the OID counter wraps around to FirstNormalObjectId, so
nobody should have a database with an OID less than that value.
If we were using smgrs to represent other kinds of ways of storing
user tables, e.g. network.c, cloud.c, magtape.c, punchcard.c, etc. -
then I think we'd have to find some way to let each user-defined
tablespace pick an smgr, and I don't really know how that would work
given the fact that we can't really use catalog lookups to figure out
which one to use, but actually the direction here is to store files
internal to the system using special-purpose smgrs so that we can pull
more things into shared_buffers, and for that purpose the database OID
seems cleaner.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
My first intuition was the same as yours -- that we should use the
tablespace to decide which smgr is relevant -- but I now think that
intuition was wrong. Even if you use the tablespace OID to select the
smgr, it doesn't completely solve the problem you're worried about
here. You still have to put SOMETHING in the database OID field, and
that's going to be just as fake as it was before.
My thought was that that could be zero if not relevant ... isn't that
what we do now for buffer tags for shared rels?
I guess you could
use the OID for pg_global for things like undo and SLRU data, but then
how is GetRelationPath() going to work? You don't have enough bits
left anywhere to specify both a tablespace and a location within that
tablespace in a reasonable way, and I think it's far from obvious how
you would build a side channel that could carry that information.
It's certainly possible/likely that we're going to end up needing to
widen buffer tags to represent the smgr explicitly, because some use
cases are going to need a real database spec, some are going to need
a real tablespace spec, and some might need both. Maybe we should
just bite that bullet.
regards, tom lane
On Thu, Feb 28, 2019 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It's certainly possible/likely that we're going to end up needing to
widen buffer tags to represent the smgr explicitly, because some use
cases are going to need a real database spec, some are going to need
a real tablespace spec, and some might need both. Maybe we should
just bite that bullet.
Well, Andres will probably complain about that. He thinks, IIUC, that
the buffer tags are too wide already and that it's significantly
hurting performance on very very common operations - like buffer
lookups. I haven't verified that myself, but I tend to think he knows
what he's talking about.
Anyway, given that your argument started off from the premise that we
had to have a pg_database row, I think we'd better look a little
harder at whether that premise is correct before getting too excited
here. As I said in my earlier reply, I think that we probably don't
need to have a pg_database row given that we wrap around to
FirstNormalObjectId; any value we hard-code would be less than that.
If there are other reasons why we'd need that, it might be useful to
hear about them.
However, all we really need to decide on this thread is whether we
need 'smgr' exposed as an SQL type. And I can't see why we need that
no matter how all of the rest of this turns out. Nobody is currently
proposing to give users a choice of smgrs, just to use them for
internal stuff. Even if that changed later, it doesn't necessarily
mean we'd add back an SQL type, or that if we did it would look like
the one we have today.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Nothing seems to break if you remove it (except for some tests using
it in an incidental way). See attached.
FWIW, +1 from me.
To be clear, I'm not objecting to the proposed patch either. I was
just wondering where we plan to go from here, given that smgr.c wasn't
getting removed.
BTW, there is stuff in src/backend/storage/smgr/README that is
already obsoleted by this patch, and more that might be obsoleted
if development proceeds as discussed here. So that needs a look.
regards, tom lane
On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
On Thu, Feb 28, 2019 at 10:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The real reason I'm concerned about this, though, is that for either
a database or a tablespace, you can *not* get away with having a magic
OID just hanging in space with no actual catalog row matching it.
If nothing else, you need an entry there to prevent someone from
reusing the OID for another purpose. And a pg_database row that
doesn't correspond to a real database is going to break all kinds of
code, starting with pg_upgrade and the autovacuum launcher. Special
rows in pg_tablespace are much less likely to cause issues, because
of the precedent of pg_global and pg_default.Also, I don't see why we'd need a fake pg_database row in the first
place. IIUC, the OID counter wraps around to FirstNormalObjectId, so
nobody should have a database with an OID less than that value.
We have scripts under catalog directory that can check to ensure OIDs
aren't re-used accidentally. However, we still have to define an entry
in a catalog somewhere and I was proposing creating a new one,
pg_storage_managers?, to track these entries. See [1] for previous
discussion on this topic. We wouldn't need to do catalog lookups for
being able to use the smgrs as the OIDs will be hardcoded in C, but the
data will be available for posterity and OID reservation.
It's certainly possible/likely that we're going to end up needing to
widen buffer tags to represent the smgr explicitly, because some use
cases are going to need a real database spec, some are going to need
a real tablespace spec, and some might need both. Maybe we should
just bite that bullet.
For now, the two projects that require the new smgrs, undo and slru, can
get away with using the database OID as the smgr differentiator. We have
enough left over bits to get our work done (Thomas please correct me if
I am mistaken). The only issue is pg_buffercache would present DB OIDs
that wouldn't map correctly (along with the rest of relfilenode). We
can modify the output to include more columns to give caching specific
information when these special buffers are encountered.
Well, Andres will probably complain about that. He thinks, IIUC, that
the buffer tags are too wide already and that it's significantly
hurting performance on very very common operations - like buffer
lookups. I haven't verified that myself, but I tend to think he knows
what he's talking about.
I can imagine this would. I am curious about the pef impact, going to at
least create a patch and do some testing.
Another thought: my colleague Anton Shyrabokau suggested potentially
re-using forknumber to differentiate smgrs. We are using 32 bits to
map 5 entries today. This approach would be similar to how we split up
the segment numbers and use the higher bits to identify forget or
unlink requests for checkpointer.
Anyway, given that your argument started off from the premise that we
had to have a pg_database row, I think we'd better look a little
harder at whether that premise is correct before getting too excited
here. As I said in my earlier reply, I think that we probably don't
need to have a pg_database row given that we wrap around to
FirstNormalObjectId; any value we hard-code would be less than that.
If there are other reasons why we'd need that, it might be useful to
hear about them.
See above, I think a new catalog, instead of pg_database, can resolve
the issue while providing the ability to reserve the OIDs.
However, all we really need to decide on this thread is whether we
need 'smgr' exposed as an SQL type. And I can't see why we need that
no matter how all of the rest of this turns out. Nobody is currently
proposing to give users a choice of smgrs, just to use them for
internal stuff. Even if that changed later, it doesn't necessarily
mean we'd add back an SQL type, or that if we did it would look like
the one we have today.
+1 smgr types can be removed.
--
Shawn Debnath
Amazon Web Services (AWS)
On Thu, Feb 28, 2019 at 10:02:46AM -0800, Shawn Debnath wrote:
in a catalog somewhere and I was proposing creating a new one,
pg_storage_managers?, to track these entries. See [1] for previous
discussion on this topic. We wouldn't need to do catalog lookups for
*ahem*, here's the footnote:
[1]: /messages/by-id/20180821184835.GA1032@60f81dc409fc.ant.amazon.com
/messages/by-id/20180821184835.GA1032@60f81dc409fc.ant.amazon.com
--
Shawn Debnath
Amazon Web Services (AWS)
Hi,
On 2019-02-28 12:36:50 -0500, Robert Haas wrote:
Well, Andres will probably complain about that. He thinks, IIUC, that
the buffer tags are too wide already and that it's significantly
hurting performance on very very common operations - like buffer
lookups.
Correct. Turns out especially comparing the keys after the hash match is
pretty expensive. It also is a significant factor influencing the size
of the hashtable, which influences how much of it can be in cache.
My plan is still to move to a two tiered system, where we have one
unordered datastructure to map from (db, tablespace, oid) to a secondary
ordered datastructure that then maps from (block number) to an actual
offset. With the first being cached somewhere in RelationData, therefore
not being performance critical. But while I hope to work for that in 13,
I don't think making other large projects depend on it would be smart.
Greetings,
Andres Freund
Hi,
On 2019-02-28 10:02:46 -0800, Shawn Debnath wrote:
We have scripts under catalog directory that can check to ensure OIDs
aren't re-used accidentally. However, we still have to define an entry
in a catalog somewhere and I was proposing creating a new one,
pg_storage_managers?, to track these entries. See [1] for previous
discussion on this topic. We wouldn't need to do catalog lookups for
being able to use the smgrs as the OIDs will be hardcoded in C, but the
data will be available for posterity and OID reservation.
I'm inclined to just put them in pg_am, with a new type 's' (we already
have amtype = i for indexes, I'm planning to add 't' for tables
soon). While not a perfect fit for storage managers, it seems to fit
well enough.
Another thought: my colleague Anton Shyrabokau suggested potentially
re-using forknumber to differentiate smgrs. We are using 32 bits to
map 5 entries today. This approach would be similar to how we split up
the segment numbers and use the higher bits to identify forget or
unlink requests for checkpointer.
That could probably be done, without incurring too much overhead
here. I'm not sure that the added complexity around the tree is worth it
however.
I personally would just go with the DB oid for the near future, where we
don't need per-database storage for those. And then plan for buffer
manager improvements.
Greetings,
Andres Freund
Shawn Debnath <sdn@amazon.com> writes:
On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
Also, I don't see why we'd need a fake pg_database row in the first
place. IIUC, the OID counter wraps around to FirstNormalObjectId, so
nobody should have a database with an OID less than that value.
We have scripts under catalog directory that can check to ensure OIDs
aren't re-used accidentally. However, we still have to define an entry
in a catalog somewhere and I was proposing creating a new one,
pg_storage_managers?, to track these entries.
That would fail to capture the property that the smgr OIDs mustn't
conflict with database OIDs, so the whole thing still seems like an
ugly kluge from here.
Another thought: my colleague Anton Shyrabokau suggested potentially
re-using forknumber to differentiate smgrs. We are using 32 bits to
map 5 entries today.
Yeah, that seems like it might be a workable answer.
I really do think that relying on magic database OIDs is a bad idea;
if you think you aren't going to need a real database ID in there in
the future, you're being short-sighted. And I guess the same argument
establishes that relying on magic tablespace OIDs would also be a bad
idea, even if there weren't a near-term proposal on the table for
needing real tablespace IDs in an alternate smgr. So we need to find
some bits somewhere else, and the fork number field is the obvious
candidate. Since, per this discussion, the smgr IDs need not really
be OIDs at all, we just need a few bits for them.
regards, tom lane
On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
Shawn Debnath <sdn@amazon.com> writes:
On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
Also, I don't see why we'd need a fake pg_database row in the first
place. IIUC, the OID counter wraps around to FirstNormalObjectId, so
nobody should have a database with an OID less than that value.We have scripts under catalog directory that can check to ensure OIDs
aren't re-used accidentally. However, we still have to define an entry
in a catalog somewhere and I was proposing creating a new one,
pg_storage_managers?, to track these entries.That would fail to capture the property that the smgr OIDs mustn't
conflict with database OIDs, so the whole thing still seems like an
ugly kluge from here.
It's definitely a kludge, but it doesn't seem that bad for now. Delaying
a nice answer till we have a more efficient bufmgr representation
doesn't seem crazy to me.
I don't think there's a real conflict risk given that we don't allow for
duplicated oids across catalogs at bootstrap time, and this is
definitely a bootstrap time issue.
Another thought: my colleague Anton Shyrabokau suggested potentially
re-using forknumber to differentiate smgrs. We are using 32 bits to
map 5 entries today.Yeah, that seems like it might be a workable answer.
Yea, if we just split that into two 16bit entries, there'd not be much
lost. Some mild mild performance regression due to more granular
memory->register reads/writes, but I can't even remotely see that
matter.
Since, per this discussion, the smgr IDs need not really be OIDs at
all, we just need a few bits for them.
Personally I find having them as oids more elegant than the distaste
from misusing the database oid for a wihle, but I think it's fair to
disagree here.
Greetings,
Andres Freund
On Fri, Mar 1, 2019 at 7:24 AM Andres Freund <andres@anarazel.de> wrote:
On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
Shawn Debnath <sdn@amazon.com> writes:
Another thought: my colleague Anton Shyrabokau suggested potentially
re-using forknumber to differentiate smgrs. We are using 32 bits to
map 5 entries today.Yeah, that seems like it might be a workable answer.
Yea, if we just split that into two 16bit entries, there'd not be much
lost. Some mild mild performance regression due to more granular
memory->register reads/writes, but I can't even remotely see that
matter.
Ok, that's a interesting way to include it in BufferTag without making
it wider. But then how about the SMGR interface? I think that value
would need to be added to the smgropen() interface, and all existing
callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and
new code would use SMGR_UNDO, SMGR_SLRU etc. That seems OK to me.
Ancient POSTGRES had an extra argument like that and would say eg
smgropen(rd->rd_rel->relsmgr, rd), but in this new idea I think it'd
always be a constant or a value from a BufferTag, and the BufferTag
would have been set with a constant, since the code reading these
buffers would always be code that knows which it wants. We'd be using
this new argument not as a modifier to control storage location as
they did back then, but rather as a whole new namespace for
RelFileNode values, that also happens to be stored differently; that
might be a hint that it could also go into RelFileNode (but I'm not
suggesting that).
Since, per this discussion, the smgr IDs need not really be OIDs at
all, we just need a few bits for them.Personally I find having them as oids more elegant than the distaste
from misusing the database oid for a wihle, but I think it's fair to
disagree here.
It sounds like your buffer mapping redesign would completely change
the economics and we could reconsider much of this later without too
much drama; that was one of the things that made me feel that the
magic database OID approach was acceptable at least in the short term.
--
Thomas Munro
https://enterprisedb.com
On Fri, Mar 1, 2019 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Our current thinking is that smgropen() should know how to map a small
number of special database OIDs to different smgr implementationsHmm. Maybe mapping based on tablespaces would be a better idea?
In the undo log proposal (about which more soon) we are using
tablespaces for their real purpose, so we need that OID. If you SET
undo_tablespaces = foo then future undo data created by your session
will be written there, which might be useful for putting that IO on
different storage.Meh. That's a point, but it doesn't exactly seem like a killer argument.
Just in the abstract, it seems much more likely to me that people would
want per-database special rels than per-tablespace special rels. And
I think your notion of a GUC that can control this is probably pie in
the sky anyway: if we can't afford to look into the catalogs to resolve
names at this code level, how are we going to handle a GUC?
I have this working like so:
* undo logs have a small amount of meta-data in shared memory, stored
in a file at checkpoint time, with all changes WAL logged, visible to
users in pg_stat_undo_logs view
* one of the properties of an undo log is its tablespace (the point
here being that it's not in a catalog)
* you don't need access to any catalogs to find the backing files for
a RelFileNode (the path via tablespace symlinks is derivable from
spcNode)
* therefore you can find your way from an UndoLogRecPtr in (say) a
zheap page to the relevant blocks on disk without any catalog access;
this should work even in the apparently (but not actually) circular
case of a pg_tablespace catalog that is stored in zheap (not something
we can do right now, but hypothetically speaking), and has undo data
that is stored in some non-default tablespace that must be consulted
while scanning the catalog (not that I'm suggesting that would
necessarily be a good idea to suppose catalogs in non-default
tablespaces; I'm just addressing your theoretical point)
* the GUC is used to resolve tablespace names to OIDs only by sessions
that are writing, when selecting (or creating) an undo log to attach
to and begin writing into; those sessions have no trouble reading the
catalog to do so without problematic circularities, as above
Seems to work; the main complications so far were coming up with
reasonable behaviour and interlocking when you drop tablespaces that
contain undo logs (short version: if they're not needed for snapshots
or rollback, they are dropped, wasting the rest of their undo address
space; otherwise they prevents the tablespace from being dropped with
a clear message to that effect).
It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though. It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.
The real reason I'm concerned about this, though, is that for either
a database or a tablespace, you can *not* get away with having a magic
OID just hanging in space with no actual catalog row matching it.
If nothing else, you need an entry there to prevent someone from
reusing the OID for another purpose. And a pg_database row that
doesn't correspond to a real database is going to break all kinds of
code, starting with pg_upgrade and the autovacuum launcher. Special
rows in pg_tablespace are much less likely to cause issues, because
of the precedent of pg_global and pg_default.
GetNewObjectId() never returns values < FirstNormalObjectId.
I don't think it's impossible for someone to want to put SMGRs in a
catalog of some kind some day. Even though the ones for clog, undo
etc would still probably need special hard-coded treatment as
discussed, I suppose it's remotely possible that someone might some
day figure out a useful way to allow extensions that provide different
block storage (nvram? zfs zvols? encryption? (see Haribabu's reply))
but I don't have any specific ideas about that or feel inclined to
design something for unknown future use.
--
Thomas Munro
https://enterprisedb.com