non-transactional pg_class

Started by Alvaro Herreraalmost 20 years ago3 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Hi,

I've been taking a look at what's needed for the non-transactional part
of pg_class. If I've understood this correctly, we need a separate
catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new
pointer in RelationData to hold a pointer to this new catalog for each
relation. Also a new syscache needs to be created (say, NTRELOID).

Must every relation have a tuple in this catalog? Currently it is
useful only for RELATION, INDEX and TOASTVALUE relkinds, so maybe we can
get away with not requiring it for other relkinds.

On the other hand, must this new catalog be boostrapped? We could
initially create RelationDescs with a NULL relation->rd_ntrel, and then
get the tuple from the syscache when somebody tries to read the fields.

I'm envisioning this new catalog have only reltuples and relpages for
now. (I'll add relvacuumxid and relminxid on the relminxid patch, but
they won't be there on the first pass.)

Obviously the idea is that we would never heap_update tuples there; only
heap_inplace_update (and heap_insert when a new relation is created.)

So there would be three patches:

1. to replace all uses of relation->rd_rel->reltuples and ->relpages
with macros RelationGetReltuples/Relpages.

2. to add the new catalog and syscache, and have the macros get the
tuple from pg_ntclass when first requested. (Also, of course, mods to
the functions that update pg_class.reltuples, etc, so that they also
update pg_ntclass).

3. the relminxid patch

Have I gotten it right?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: non-transactional pg_class

Alvaro Herrera <alvherre@commandprompt.com> writes:

I've been taking a look at what's needed for the non-transactional part
of pg_class. If I've understood this correctly, we need a separate
catalog, which I've dubbed pg_ntclass (better ideas welcome), and a new
pointer in RelationData to hold a pointer to this new catalog for each
relation. Also a new syscache needs to be created (say, NTRELOID).

Do you really need both a relcache slot and a syscache? Seems
redundant. For that matter, do you need either? Both the relcache and
syscache operate on the assumption of transactional updates, so I think
that you're going to have semantic problems using the caches to hold
these tuples. For instance we don't broadcast any sinval update
messages from a rolled-back transaction.

On the other hand, must this new catalog be boostrapped?

If relation creation or row insertion is going to try to write into it,
then yes. You could get away with not writing a row initially as long
as the rows only hold reltuples/relpages, but I think that would stop
working as soon as you put the "unfreeze" code in.

Obviously the idea is that we would never heap_update tuples there; only
heap_inplace_update (and heap_insert when a new relation is created.)

Initial insertion (table CREATE) and deletion (table DROP) would both
have to be transactional operations. This may be safe because we'd hold
exclusive lock on the table and so no one else would be touching the
table's row, but it bears thinking about, because after all the whole
point of the exercise is to keep transactional and nontransactional
updates separate.

What happens if someone tries to do a manual UPDATE in this catalog?
Maybe this can be in the category of "superusers should know enough not
to do that", but I'd like to be clear on exactly what the consequences
might be. Perhaps "nontransactional catalogs" should be a new relkind
that we disallow normal updates on.

If we do disallow normal updates (and VACUUM FULL too, probably) then
it'd be possible to say that a given entry has a fixed TID for its
entire lifespan. Then we could store the TID in the table's regular
pg_class entry and dispense with any indexes. This would be
advantageous if we end up concluding that we can't use the syscache
mechanism (as I suspect that we can't), because we're going to be making
quite a lot of fetches from this catalog. A direct fetch by TID would
be a lot cheaper than an index search.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: non-transactional pg_class

[Reposting to the correct list, sorry if it's duplicated]

Tom Lane wrote:

If we do disallow normal updates (and VACUUM FULL too, probably) then
it'd be possible to say that a given entry has a fixed TID for its
entire lifespan. Then we could store the TID in the table's regular
pg_class entry and dispense with any indexes. This would be
advantageous if we end up concluding that we can't use the syscache
mechanism (as I suspect that we can't), because we're going to be making
quite a lot of fetches from this catalog. A direct fetch by TID would
be a lot cheaper than an index search.

First attempt at realizing this idea. pg_ntclass is a relation of a new
relkind, RELKIND_NON_TRANSACTIONAL (ideas for shorter names welcome).
In pg_class, we store a TID to the corresponding tuple. The tuples are
not cached; they are obtained by heap_fetch() each time they are
requested. This may be worth reconsideration.

heap_update refuses to operate on a non-transactional catalog, because
there's no (easy) way to update pg_class accordingly. This normally
shouldn't be a problem. vac_update_relstats updates the tuple by
using the new heap_inplace_update call.

VACUUM FULL also refuses to operate on these tables, and ANALYZE
silently skips them. Only plain VACUUM cleans them.

Note that you can DELETE from pg_ntclass. Not sure if we should
disallow it somehow, because it's not easy to get out from that if you
do.

Regression test pass; I updated the stats test because it was accessing
pg_class.relpages. So there's already a test to verify that it's
working.

No documentation yet.

There are several warts needed to make it all work:

1. I had to add a "typedef" to pg_class.h to put ItemPointerData in
FormData_pg_class, because the C struct doesn't recognize "tid" but the
bootstrapper does not recognize ItemPointerData as a valid type. I find
this mighty ugly because it will have side effects whenever we #include
pg_class.h. Suggestions welcome.

2. During bootstrap, RelationBuildLocalRelation creates nailed relations
with hardcoded TID=(0,1). This is because we don't have access to
pg_class yet, so we can't find the real pointer; and furthermore, we are
going to fix the entries later in the bootstrapping process.

3. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty
ugly as well; and autovacuum is skipping pg_ntclass (really all
non-transactional catalogs) altogether. We could improve the situation
by introducing some sort of struct like {relid, relkind}, so that
vacuum_rel could know what relkind to expect, and it could skip
non-transactional catalogs cleanly in vacuum full and analyze.

I appreciate any comments.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

ntclass-1.patchtext/plain; charset=us-asciiDownload+627-340