global temporary tables

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#1)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

A couple of recent threads made got me thinking again about the idea
of global temporary tables. There seem to be two principal issues:

1. What is a global temporary table?

2. How could we implement that?

Despite rereading the "idea: global temp tables" thread from April
2009 in some detail, I was not able to get a clear understanding of
(1).

I believe that the spec's distinction between global and local temp
tables has to do with whether they are visible across module
boundaries. Since we haven't implemented modules, that distinction
is meaningless to us. In the spec, *both* types of temp tables have
the property that the definition (schema) of the table is global
across all sessions, and only the content of the table is session-local.

This arrangement clearly is useful for some applications, but so is our
current definition wherein different sessions can have different schemas
for the same temp table name. So eventually it'd be good to support
both. But the GLOBAL/LOCAL TEMP TABLE distinction is something entirely
different. PG's behavior does not correspond to either of those.

Your idea of using the relmapper layer to instantiate copies of temp
tables is an interesting one. It's only a small piece of the puzzle
though. In particular, what you described would result in the table
having the same OID in all sessions, even though the relfilenodes are
different --- amd since locking is done on the basis of OID, that's
probably *not* what we want. It would be much better for performance
if the different sessions' versions of the table were independently
lockable.

I also kind of wonder what is supposed to happen if someone DROPs or
ALTERs the temp table definition ...

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#2)

Re: global temporary tables

On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

A couple of recent threads made got me thinking again about the idea
of global temporary tables. There seem to be two principal issues:

1. What is a global temporary table?

2. How could we implement that?

Despite rereading the "idea: global temp tables" thread from April
2009 in some detail, I was not able to get a clear understanding of
(1).

I believe that the spec's distinction between global and local temp
tables has to do with whether they are visible across module
boundaries. Since we haven't implemented modules, that distinction
is meaningless to us. In the spec, *both* types of temp tables have
the property that the definition (schema) of the table is global
across all sessions, and only the content of the table is session-local.

This arrangement clearly is useful for some applications, but so is our
current definition wherein different sessions can have different schemas
for the same temp table name. So eventually it'd be good to support
both. But the GLOBAL/LOCAL TEMP TABLE distinction is something entirely
different. PG's behavior does not correspond to either of those.

I don't really care what we call it, although I find the GLOBAL name
convenient and descriptive.

Your idea of using the relmapper layer to instantiate copies of temp
tables is an interesting one. It's only a small piece of the puzzle
though. In particular, what you described would result in the table
having the same OID in all sessions, even though the relfilenodes are
different --- amd since locking is done on the basis of OID, that's
probably *not* what we want. It would be much better for performance
if the different sessions' versions of the table were independently
lockable.

Well, it depends on what operation we're talking about. For
operations that involve only the table contents, yeah, we'd like to
lock the versions independently. But for this sort of thing:

I also kind of wonder what is supposed to happen if someone DROPs or
ALTERs the temp table definition ...

...not so much. Here you REALLY want a DROP attempt to acquire an
AccessExclusiveLock that will conflict with any outstanding
AccessShareLocks. Similarly, you're only going to be able to modify
the schema for the relation if it's not otherwise in use.

...Robert

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#3)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I also kind of wonder what is supposed to happen if someone DROPs or
ALTERs the temp table definition ...

...not so much. Here you REALLY want a DROP attempt to acquire an
AccessExclusiveLock that will conflict with any outstanding
AccessShareLocks. Similarly, you're only going to be able to modify
the schema for the relation if it's not otherwise in use.

I think you're presuming the answer to the question. We could also view
the desired behavior as being that each session clones the temp table
definition at some instant (eg, first use). The approach that you're
assuming seems fraught with large downsides: in particular, implementing
ALTER TABLE would be a mess. The would-be alterer would need access to
the physical copies of all sessions, which throws out not only the
assumption that the relmapper entries can be private data, but all of
the access optimizations we currently have in the local buffer manager.
Not to mention the coding mess of having to repeat the ALTER operation
for each of N copies, some of which might disappear while we're trying
to do it (or if they don't, we're blocking backends from exiting).
I don't even know how you'd do the ALTER over again N times if you
only have one set of catalog entries describing the N copies.

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#4)

Re: global temporary tables

On Fri, Apr 23, 2010 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I also kind of wonder what is supposed to happen if someone DROPs or
ALTERs the temp table definition ...

...not so much. Here you REALLY want a DROP attempt to acquire an
AccessExclusiveLock that will conflict with any outstanding
AccessShareLocks. Similarly, you're only going to be able to modify
the schema for the relation if it's not otherwise in use.

I think you're presuming the answer to the question. We could also view
the desired behavior as being that each session clones the temp table
definition at some instant (eg, first use). The approach that you're
assuming seems fraught with large downsides: in particular, implementing
ALTER TABLE would be a mess. The would-be alterer would need access to
the physical copies of all sessions, which throws out not only the
assumption that the relmapper entries can be private data, but all of
the access optimizations we currently have in the local buffer manager.

I agree, that would be pretty unfortunate, althogh maybe it's the only
way to make it work. It's not what I had in mind. I was thinking
that the would-be ALTERER could just take an AccessExclusiveLock, but
now that I think about it that doesn't work, since a backend could
have the table unlocked between transactions but still have private
contents in it. :-(

Not to mention the coding mess of having to repeat the ALTER operation
for each of N copies, some of which might disappear while we're trying
to do it (or if they don't, we're blocking backends from exiting).
I don't even know how you'd do the ALTER over again N times if you
only have one set of catalog entries describing the N copies.

Well, if you clone the table, that just pushes the problem around.
When I run ALTER TABLE on one of these thingamabobs, does it modify my
clone? The original? Both? If it modifies my clone, how do we
modify the original? If it modifies the original, won't I be rather
surprised to find my clone unaffected? If it modifies both, how do we
avoid complete havoc if the original has since been modified (perhaps
incompatibly, perhaps not) by some other backend doing its own ALTER
TABLE?

...Robert

Pavel Stehule

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#1)

Re: global temporary tables

2010/4/24 Robert Haas <robertmhaas@gmail.com>:

A couple of recent threads made got me thinking again about the idea
of global temporary tables. There seem to be two principal issues:

1. What is a global temporary table?

2. How could we implement that?

Despite rereading the "idea: global temp tables" thread from April
2009 in some detail, I was not able to get a clear understanding of
(1). What I *think* it is supposed to mean is that the table is a
permanent object which is "globally" visible - that is, it's part of
some non-temp schema like public or $user and it's column definitions
etc. are visible to all backends - and it's not automatically removed
on commit, backend exit, etc. - but the *contents* of the table are
temporary and backend-local, so that each new backend initially sees
it as empty and can then insert, update, and delete data independently
of what any other backend does.

As to (2), my thought is that perhaps we could implement this by
instantiating a separate relfilenode for the relation for each backend
which accesses it. relfilenode would be 0 in pg_class, as it is for
"mapped" relations, but every time a backend touched the rel, we'd
allocate a relfilenode and associated the oid of the temp table to it
using some kind of backend-local storage - actually similar to what
the relmapper code does, except without the complexity of ever
actually having to persist the value; and perhaps using a hash table
rather than an array, since the number of mapped rels that a backend
can need to deal with is rather more limited than the number of temp
tables it might want to use.

it is good idea.

I missing some ideas about statistics, about indexes.

Regards
Pavel Stehule

Show quoted text

Thoughts?

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Greg Sabino Mullane

greg@turnstep.com

over 15 years ago

In reply to: Robert Haas (#5)

Re: global temporary tables

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

...
surprised to find my clone unaffected? If it modifies both, how do we
avoid complete havoc if the original has since been modified (perhaps
incompatibly, perhaps not) by some other backend doing its own ALTER
TABLE?

Since this is such a thorny problem, and this is a temporary table, why
not just disallow ALTER completely for the first pass?

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201004241201
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkvTFfsACgkQvJuQZxSWSsjrVACePmmNglGi6KoZgYZ7zjUm4gPm
o2wAoNYYuiZl1HZXsgiAOQkJzNUmaORm
=IijV
-----END PGP SIGNATURE-----

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Greg Sabino Mullane (#7)

Re: global temporary tables

"Greg Sabino Mullane" <greg@turnstep.com> writes:

surprised to find my clone unaffected? If it modifies both, how do we
avoid complete havoc if the original has since been modified (perhaps
incompatibly, perhaps not) by some other backend doing its own ALTER
TABLE?

Since this is such a thorny problem, and this is a temporary table, why
not just disallow ALTER completely for the first pass?

Usually the way we approach these kinds of problems is that we want
to see some plausible outline for how they might be fixed before we
move forward with the base feature. IOW, I wouldn't object to not
having ALTER in the first release, but if we have no idea how to do
ALTER at all I'd be too worried that we were painting ourselves into
a corner.

Or maybe you can make a case that there's no need to allow ALTER at
all, ever. But surely DROP needs to be possible, and that seems to
already introduce some of the same issues.

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#8)

Re: global temporary tables

On Sat, Apr 24, 2010 at 12:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Greg Sabino Mullane" <greg@turnstep.com> writes:

surprised to find my clone unaffected? If it modifies both, how do we
avoid complete havoc if the original has since been modified (perhaps
incompatibly, perhaps not) by some other backend doing its own ALTER
TABLE?

Since this is such a thorny problem, and this is a temporary table, why
not just disallow ALTER completely for the first pass?

Usually the way we approach these kinds of problems is that we want
to see some plausible outline for how they might be fixed before we
move forward with the base feature. IOW, I wouldn't object to not
having ALTER in the first release, but if we have no idea how to do
ALTER at all I'd be too worried that we were painting ourselves into
a corner.

Or maybe you can make a case that there's no need to allow ALTER at
all, ever. But surely DROP needs to be possible, and that seems to
already introduce some of the same issues.

I had the same thought as GSM this morning. More specifically, it
seems to me that the problematic cases are precisely those in which
you might feel an urge to touch somebody else's local buffers, so I
think we should disallow, approximately, those ALTER TABLE cases which
require a full-table rewrite. I don't see the problem with DROP.
Under the proposed design, it's approximately equivalent to dropping a
table that someone else has truncated. You just wait for the
necessary lock and then do it.

At least AIUI, the use case for this feature is that you want to avoid
creating "the same" temporary table over and over again. The schema
is fixed and doesn't change much, but you're creating it lots and lots
of times in lots and lots of different backends, leading to both
management and performance difficulties. If you want to be able to
change the schema frequently or in a backend-local way, use the
existing temporary table feature.

Now, there is ONE problem with DROP, which is that you might orphan
some heaps. Of course, that can also happen due to a backend crash.
Currently, autovacuum arranges to drop any orphaned temp tables that
have passed the wraparound threshold, but even if we were happy with
waiting 2 billion transactions to get things cleaned up, the mechanism
can't work here because it relies on being able to examine the
pg_class row and determine which backend owns it, and where the
storage is located.

We could possibly set things up so that a running backend will notice
if a global temporary table for which it's created a private
relfilenode gets dropped, and blow away the backing file. But that
doesn't protect against crashes, so I think we're going to need some
other garbage collection mechanism, either instead of in addition to
asking backends to clean up after themselves. I'm not quite sure what
the design of that should look like, though.

...Robert

#10

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#9)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

At least AIUI, the use case for this feature is that you want to avoid
creating "the same" temporary table over and over again.

The context that I've seen it come up in is that people don't want to
clutter their functions with create-it-if-it-doesn't-exist logic,
which you have to have given the current behavior of temp tables.
Any performance gain from reduced catalog churn would be gravy.

Aside from the DROP problem, I think this implementation proposal
has one other big shortcoming: what are you going to do about
table statistics? In many cases, you really *have* to do an ANALYZE
once you've populated a temp table, if you want to get decent plans
for it. Where will you put those stats?

regards, tom lane

#11

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#9)

Re: global temporary tables

[ forgot to respond to this part ]

Robert Haas <robertmhaas@gmail.com> writes:

... I don't see the problem with DROP.
Under the proposed design, it's approximately equivalent to dropping a
table that someone else has truncated. You just wait for the
necessary lock and then do it.

And do *what*? You can remove the catalog entries, but how are you
going to make the physical storage of other backends' versions go away?
(To say nothing of making them flush their local buffers for it.)
If you do remove the catalog entries, won't you be cutting the knees
out from under whatever end-of-session cleanup processing might exist
in those other backends?

The idea of the global table as a template that individual sessions
clone working tables from would avoid most of these problems. You
rejected it on the grounds that ALTER would be too hard; but if you're
blowing off ALTER anyway, that argument seems pretty unimpressive.

regards, tom lane

#12

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#10)

Re: global temporary tables

On Sat, Apr 24, 2010 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

At least AIUI, the use case for this feature is that you want to avoid
creating "the same" temporary table over and over again.

The context that I've seen it come up in is that people don't want to
clutter their functions with create-it-if-it-doesn't-exist logic,
which you have to have given the current behavior of temp tables.
Any performance gain from reduced catalog churn would be gravy.

I think there's a significant contingent on this mailing list who feel
that that gravy would be rather tasty and would like very much to
enjoy some of it along with their temporary table tetrazzini.

Aside from the DROP problem, I think this implementation proposal
has one other big shortcoming: what are you going to do about
table statistics? In many cases, you really *have* to do an ANALYZE
once you've populated a temp table, if you want to get decent plans
for it. Where will you put those stats?

For a first cut, I had thought about ignoring the problem. Now, that
may sound stupid, because now if two different backends have very
different distributions of data in the table and both do an ANALYZE,
one set of statistics will clobber the other set of statistics. On
the flip side, for some usage patterns, it might be actually work out
to a win. Maybe the data I'm putting in here today is a great deal
like the data I put in here yesterday, and planning it with
yesterday's statistics doesn't cost enough to be worth a re-ANALYZE.

If we don't want to do that, I suppose one option is to create a
pg_statistic-like table in the backend's temporary tablespace and put
them there; or we could put them into a backend-local hash table. The
current setup of pg_statistic is actually somewhat weak for a number
of things we might want to do: for example, it might be interesting to
gather statistics for the subset of a table for which a particular
partial index is predOK. When such an index is available for a
particular query, we could use the statistics for that subset of the
table instead of the overall statistics for the table, and get better
estimates. Or we could even let the user specify predicates which
will cause the table to have a different statistical distribution than
the table as a whole, and gather statistics for the subset that
matches the predicate. One approach would be to make the starelid
column able to reference something other than a relation OID, although
I don't think that actually helps with the global temp table problem
because if we use the real pg_statistic to store the data then we have
to arrange to clean it up.

...Robert

#13

Pavel Stehule

pavel.stehule@gmail.com

over 15 years ago

In reply to: Robert Haas (#12)

Re: global temporary tables

For a first cut, I had thought about ignoring the problem. Now, that
may sound stupid, because now if two different backends have very
different distributions of data in the table and both do an ANALYZE,
one set of statistics will clobber the other set of statistics. On
the flip side, for some usage patterns, it might be actually work out
to a win. Maybe the data I'm putting in here today is a great deal
like the data I put in here yesterday, and planning it with
yesterday's statistics doesn't cost enough to be worth a re-ANALYZE.

Both variant can be. First time - statistic can be taken from some
"original" (can be empty). After ANALYZE the statistic can be
individual.

Regards
Pavel

#14

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#11)

Re: global temporary tables

On Sat, Apr 24, 2010 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ forgot to respond to this part ]

Robert Haas <robertmhaas@gmail.com> writes:

... I don't see the problem with DROP.
Under the proposed design, it's approximately equivalent to dropping a
table that someone else has truncated. You just wait for the
necessary lock and then do it.

And do *what*? You can remove the catalog entries, but how are you
going to make the physical storage of other backends' versions go away?
(To say nothing of making them flush their local buffers for it.)
If you do remove the catalog entries, won't you be cutting the knees
out from under whatever end-of-session cleanup processing might exist
in those other backends?

Well, if I knew for sure what the best way was to solve all of these
problems, I'd be posting a finished patch rather than a request for
comment on a design. It's not obvious to me that there's a terribly
thorny problem in the area you're concerned about, but your concern is
making me worry that I'm missing something. Why would the
end-of-session processing need the catalog entries? It seems like
whatever backend-local data structure we're using to record the
relfilenode mappings would be sufficent to nuke the backend storage,
and what else needs doing?

The idea of the global table as a template that individual sessions
clone working tables from would avoid most of these problems. You
rejected it on the grounds that ALTER would be too hard; but if you're
blowing off ALTER anyway, that argument seems pretty unimpressive.

I don't think that avoiding the catalog churn is something to be
dismissed lightly, but I also think that cloning the table is likely
to be significantly more difficult from an implementation point of
view. Under the implementation I'm proposing, we don't need much that
is fundamentally all that new. Global temporary tables can be treated
like our existing temp tables for purposes of XLog and bufmgr, but
they don't get forced into a temporary namespace. The relation
mapping infrastructure provides a pretty good start for using a
relfilenode that isn't stored in pg_class. I've already gone through
the exercise of finding all the places where we check rd_istemp and
changing them to use macros instead (RelationNeedsWAL, IsBackendLocal,
etc.) and it's not bad.

There's a related project which I think can also leverage much of this
same infrastructure: unlogged tables. We've talked about this before,
but in short the idea is that an unlogged table behaves like a regular
table in all respects except that we never write WAL for it; and we
truncate it at shutdown and at startup. Therefore, it doesn't show up
on standby servers, and its contents are not persistent across
restarts, but performance is improved. It's suitable for things like
"the table of currently logged in users", where you don't mind forcing
everyone to log in again if the database crashes. (It might even be
possible to allow writes to unlogged tables on standby servers, though
I'm not feeling that ambitious ATM.) So:

- local temp tables exist in a temp namespace, use local buffers, and skip WAL
- global temp tables exist in a non-temp namespace, use local buffers,
and skip WAL
- unlogged tables exist in a non-temp namespace, use shared buffers,
and skip WAL
- normal tables exist in a non-temp namespace, use shared buffers, and write WAL

Thoughts?

...Robert

#15

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#14)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

... Why would the
end-of-session processing need the catalog entries? It seems like
whatever backend-local data structure we're using to record the
relfilenode mappings would be sufficent to nuke the backend storage,
and what else needs doing?

Well, if you're intending to have a separate data structure and code
path for cleaning up this type of temp table, then maybe you don't need
to touch any catalog entries. I'm concerned though about how far the
effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX
will probably have issues with this. Right now they think in terms
of writing a new pg_class entry in order to reassociate tables with
new relfilenodes.

Have you thought much about the previously proposed design, ie keeping
catalog entries for temp tables in backend-local temporary catalogs?
That would certainly be a lot of work, but I think in the end it might
fit in better. This design feels like it's being driven by "hey,
we can abuse the relmapper to sort of do what we want", and not by
what we really want.

regards, tom lane

#16

simon@2ndQuadrant.com

over 15 years ago

In reply to: Robert Haas (#1)

Re: global temporary tables

On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote:

Thoughts?

Only a requirement: that we design this in a way that will allow temp
tables to be used during Hot Standby. I make not other comment.

--
Simon Riggs www.2ndQuadrant.com

#17

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#15)

Re: global temporary tables

On Sat, Apr 24, 2010 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

... Why would the
end-of-session processing need the catalog entries? It seems like
whatever backend-local data structure we're using to record the
relfilenode mappings would be sufficent to nuke the backend storage,
and what else needs doing?

Well, if you're intending to have a separate data structure and code
path for cleaning up this type of temp table, then maybe you don't need
to touch any catalog entries. I'm concerned though about how far the
effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX
will probably have issues with this. Right now they think in terms
of writing a new pg_class entry in order to reassociate tables with
new relfilenodes.

Have you thought much about the previously proposed design, ie keeping
catalog entries for temp tables in backend-local temporary catalogs?
That would certainly be a lot of work, but I think in the end it might
fit in better. This design feels like it's being driven by "hey,
we can abuse the relmapper to sort of do what we want", and not by
what we really want.

Well, yes and no. I think there are definitely some good things that
can happen if we can see our way to taking a hammer to pg_class and
pg_attribute. If we create, e.g. pg_shared_class and
pg_shared_attribute, then we can un-nail the catalogs you just nailed
to make the authentication process able to work without selecting a
database. We can also enable (without guilt) clustering both those
catalogs and the database-specific versions of pg_class, since we no
longer have to worry about having multiple copies of the row that can
get out of sync with each other. And if we further break off
pg_temp_class and pg_temp_attribute, then we can also have our
existing flavor of temporary tables without worrying about catalog
bloat, which would be great. There may be other applications as well.

Having said all that, it doesn't actually allow us to implement global
temporary tables, because obviously the catalog entries for a global
temporary table have to be permanent. Of course, if we didn't have to
worry about catalog bloat, the "clone" approach you're proposing would
be somewhat more attractive, but I actually think that the synergy is
in the other direction: the perfect place to store the catalog entries
and statistics for local temporary tables is - in a global temporary
table! Note that while a local temporary table can never inherit from
a permanent table, it's entirely sensible to let global temporary
tables inherit from permanent tables. Different backends will have
different views of the overall contents of the parent table, but
that's OK, even desirable.

...Robert

#18

robertmhaas@gmail.com

over 15 years ago

In reply to: Simon Riggs (#16)

Re: global temporary tables

On Sat, Apr 24, 2010 at 6:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote:

Thoughts?

Only a requirement: that we design this in a way that will allow temp
tables to be used during Hot Standby. I make not other comment.

For so long as local temporary tables put their catalog entries in
pg_class, we're not going to be able to use them during Hot Standby.
See the email I just sent elsewhere on this thread for a long term
roadmap to getting out of that pickle. At least under the
implementation I'm proposing here, making global temporary tables
usable would be an easier nut to crack, because the catalog entries
are a non-issue. There is one major problem, though: assigning a
scratch relfilenode to the temporary table requires generating an OID,
which we currently have no way to allow on the standby. Upthread I
also proposed an implementation for unlogged tables (that is, contents
don't survive a server bounce) which wouldn't have that problem
either, although I haven't fully thought it through and there may very
well be other issues.

So in short: I don't think anything we're talking about it would make
HS use harder, and some of it might make it easier. But probably some
additional engineering effort dedicated specifically to solving the
problems unique to HS would still be needed.

...Robert

#19

Jim Nasby

decibel@decibel.org

over 15 years ago

In reply to: Tom Lane (#10)

Re: global temporary tables

On Apr 24, 2010, at 12:31 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

At least AIUI, the use case for this feature is that you want to avoid
creating "the same" temporary table over and over again.

The context that I've seen it come up in is that people don't want to
clutter their functions with create-it-if-it-doesn't-exist logic,
which you have to have given the current behavior of temp tables.
Any performance gain from reduced catalog churn would be gravy.

Aside from the DROP problem, I think this implementation proposal
has one other big shortcoming: what are you going to do about
table statistics? In many cases, you really *have* to do an ANALYZE
once you've populated a temp table, if you want to get decent plans
for it. Where will you put those stats?

One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard.

As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend. On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automatically when a backend starts and discovers it already has a directory, though we should also provide an administrator function that will clobber all directories that no longer have backends.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#20

robertmhaas@gmail.com

over 15 years ago

In reply to: Jim Nasby (#19)

Re: global temporary tables

On Sat, Apr 24, 2010 at 8:47 PM, Jim Nasby <decibel@decibel.org> wrote:

On Apr 24, 2010, at 12:31 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

At least AIUI, the use case for this feature is that you want to avoid
creating "the same" temporary table over and over again.

The context that I've seen it come up in is that people don't want to
clutter their functions with create-it-if-it-doesn't-exist logic,
which you have to have given the current behavior of temp tables.
Any performance gain from reduced catalog churn would be gravy.

Aside from the DROP problem, I think this implementation proposal
has one other big shortcoming: what are you going to do about
table statistics? In many cases, you really *have* to do an ANALYZE
once you've populated a temp table, if you want to get decent plans
for it. Where will you put those stats?

One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard.

Yeah, I don't think that would be too horrible. Part of me feels like
you'd want to have the ability to store stats for a global temp table
in either one of those tables depending on use-case, but I'm also
reluctant to invent a lot of new syntax for a very limited use case.

As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend.

That would impose a couple of implementation restrictions that don't
seem necessary. One, it would imply ignoring reltablespace. Two, it
would prohibit (or at least complicate) allowing a backend to CLUSTER
or REINDEX its own private copy of the rel.

On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automatically when a backend starts and discovers it already has a directory, though we should also provide an administrator function that will clobber all directories that no longer have backends.

Unfortunately, I don't see much alternative to making smgr know
something about the temp-ness of the relation, though I'm hoping to
keep the smgr surgery to an absolute minimum. Maybe what we could do
is incorporate the backend ID or PID into the file name when the
relation is temp. Then we could scan for and nuke such files pretty
easily. Otherwise I can't really think how to make it work.

...Robert

#21

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#20)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

Unfortunately, I don't see much alternative to making smgr know
something about the temp-ness of the relation, though I'm hoping to
keep the smgr surgery to an absolute minimum. Maybe what we could do
is incorporate the backend ID or PID into the file name when the
relation is temp. Then we could scan for and nuke such files pretty
easily. Otherwise I can't really think how to make it work.

I think that could be a really good idea, mainly because it makes
post-crash cleanup MUCH safer: you can tell with certainty from the
filename that it's a leftover temp table. The notion of zapping files
just because we don't see them listed in pg_class has always scared the
heck out of me.

We already know temp-ness at pretty low levels, like bufmgr vs localbuf.
Pushing it all the way down to smgr doesn't seem like a leap; in fact
I think it would eliminate a separate isTemp parameter in a lot of places.

regards, tom lane

#22

Jim Nasby

decibel@decibel.org

over 15 years ago

In reply to: Robert Haas (#20)

Re: global temporary tables

On Apr 24, 2010, at 8:14 PM, Robert Haas wrote:

One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard.

Yeah, I don't think that would be too horrible. Part of me feels like
you'd want to have the ability to store stats for a global temp table
in either one of those tables depending on use-case, but I'm also
reluctant to invent a lot of new syntax for a very limited use case.

Yeah, I'm thinking that's very probably overkill. And if we were going to go to that level, I think it would be far more useful to provide an interface to allow manual control over statistics first, so that you can give the optimizer custom information.

As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend.

That would impose a couple of implementation restrictions that don't
seem necessary. One, it would imply ignoring reltablespace. Two, it
would prohibit (or at least complicate) allowing a backend to CLUSTER
or REINDEX its own private copy of the rel.

Well, the same structure could be imposed underneath a temptablespace. I don't think it matters where you ultimately put it, the goal is just to make sure you can definitively tell that a file was a: temporary and b: what PID it belonged to. That allows for safe cleanup.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#23

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#21)

Re: global temporary tables

On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Unfortunately, I don't see much alternative to making smgr know
something about the temp-ness of the relation, though I'm hoping to
keep the smgr surgery to an absolute minimum. Maybe what we could do
is incorporate the backend ID or PID into the file name when the
relation is temp. Then we could scan for and nuke such files pretty
easily. Otherwise I can't really think how to make it work.

I think that could be a really good idea, mainly because it makes
post-crash cleanup MUCH safer: you can tell with certainty from the
filename that it's a leftover temp table. The notion of zapping files
just because we don't see them listed in pg_class has always scared the
heck out of me.

We already know temp-ness at pretty low levels, like bufmgr vs localbuf.
Pushing it all the way down to smgr doesn't seem like a leap; in fact
I think it would eliminate a separate isTemp parameter in a lot of places.

Eh? I don't see how it's going to do that.

...Robert

#24

robertmhaas@gmail.com

over 15 years ago

In reply to: Robert Haas (#23)

Re: global temporary tables

On Sat, Apr 24, 2010 at 9:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Unfortunately, I don't see much alternative to making smgr know
something about the temp-ness of the relation, though I'm hoping to
keep the smgr surgery to an absolute minimum. Maybe what we could do
is incorporate the backend ID or PID into the file name when the
relation is temp. Then we could scan for and nuke such files pretty
easily. Otherwise I can't really think how to make it work.

I think that could be a really good idea, mainly because it makes
post-crash cleanup MUCH safer: you can tell with certainty from the
filename that it's a leftover temp table. The notion of zapping files
just because we don't see them listed in pg_class has always scared the
heck out of me.

We already know temp-ness at pretty low levels, like bufmgr vs localbuf.
Pushing it all the way down to smgr doesn't seem like a leap; in fact
I think it would eliminate a separate isTemp parameter in a lot of places.

Eh? I don't see how it's going to do that.

Oh, maybe I do see. If we pass it to smgropen() and stash it in the
SMgrRelation, we don't have to keep supplying it later on, maybe?

Will investigate further.

...Robert

#25

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#24)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

Oh, maybe I do see. If we pass it to smgropen() and stash it in the
SMgrRelation, we don't have to keep supplying it later on, maybe?

Right. I'm unsure whether we should push it into the RelFileNode
struct itself, but even having it in SMgrRelation ought to cut out
a few places where it now has to be passed separately.

regards, tom lane

#26

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#25)

Re: global temporary tables

On Sat, Apr 24, 2010 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Oh, maybe I do see. If we pass it to smgropen() and stash it in the
SMgrRelation, we don't have to keep supplying it later on, maybe?

Right. I'm unsure whether we should push it into the RelFileNode
struct itself, but even having it in SMgrRelation ought to cut out
a few places where it now has to be passed separately.

Pushing it into the RelFileNode has some advantages in terms of being
able to get at the information from everywhere, but one thing that
makes me think that's probably not a good decision is that we somtimes
WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is
a waste, because if we're WAL-logging, it's zero.

...Robert

#27

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#26)

Re: global temporary tables

Robert Haas <robertmhaas@gmail.com> writes:

Pushing it into the RelFileNode has some advantages in terms of being
able to get at the information from everywhere, but one thing that
makes me think that's probably not a good decision is that we somtimes
WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is
a waste, because if we're WAL-logging, it's zero.

Yeah. I think we also use RelFileNode as a hash tag in places, and
so adding a bool to it would be problematic for a couple of reasons:
possibly uninitialized pad bytes, and uselessly incorporating more bytes
into the hash calculation.

regards, tom lane

#28

simon@2ndQuadrant.com

over 15 years ago

In reply to: Robert Haas (#18)

Re: global temporary tables

On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote:

There is one major problem, though: assigning a
scratch relfilenode to the temporary table requires generating an OID,
which we currently have no way to allow on the standby.

Why not have an unlogged counter, which resets each system start, using
same datatype as an oid. There's no necessity for the relfilenode to be
an actual oid is there?

That way we could use it on standbys also.

--
Simon Riggs www.2ndQuadrant.com

#29

robertmhaas@gmail.com

over 15 years ago

In reply to: Simon Riggs (#28)

Re: global temporary tables

On Sun, Apr 25, 2010 at 3:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote:

There is one major problem, though: assigning a
scratch relfilenode to the temporary table requires generating an OID,
which we currently have no way to allow on the standby.

Why not have an unlogged counter, which resets each system start, using
same datatype as an oid. There's no necessity for the relfilenode to be
an actual oid is there?

That way we could use it on standbys also.

I don't think that quite works, because the standby might assign a
relfilenode number for a global temp table and then the master might
subsequently assign the same relfilenode number to a regular table.
We might be able to make that not matter, but it's far from obvious to
me that there are no gotchas there...

...Robert

#30

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#27)

1 attachment(s)

Re: global temporary tables

On Sat, Apr 24, 2010 at 11:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Pushing it into the RelFileNode has some advantages in terms of being
able to get at the information from everywhere, but one thing that
makes me think that's probably not a good decision is that we somtimes
WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is
a waste, because if we're WAL-logging, it's zero.

Yeah. I think we also use RelFileNode as a hash tag in places, and
so adding a bool to it would be problematic for a couple of reasons:
possibly uninitialized pad bytes, and uselessly incorporating more bytes
into the hash calculation.

Right. I was thinking about the padding issue, too. I took a crack
at adding an isTemp argument to smgropen() and removing it everywhere
else, but it turns out this isn't as straightforward as it ought to be
because nbtsort.c, rewriteheap.c, and tablecmds.c feel entitled to
violate the abstraction layer by passing isTemp = true to smgrwrite()
and/or smgrextend() even when the real value is false, under the
assumption that the only thing isTemp is doing in those functions is
controlling fsync behavior. I think we'll have to replace the isTemp
argument to those functions with a boolean whose explicit charter is
to do what they're using it for.

Removing the isTemp from smgrtruncate() and smgrdounlink() argument
looks easy, though. WIP patch that takes it that far is attached.

[davidfetter: patch description for PWN is "WIP patch to push isTemp
down into the smgr layer"]

...Robert

Attachments:

smgropen_istemp_wip.patchtext/x-diff; charset=US-ASCII; name=smgropen_istemp_wip.patchDownload

diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index d1f7bcc..9645c95 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -373,8 +373,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	}
 
 	/* Truncate the unused VM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks,
-				 rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, newnblocks);
 
 	/*
 	 * We might as well update the local smgr_vm_nblocks setting. smgrtruncate
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3cc9bdd..b318700 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1317,13 +1317,13 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	}
 	for (i = 0; i < ndelrels; i++)
 	{
-		SMgrRelation srel = smgropen(delrels[i]);
+		SMgrRelation srel = smgropen(delrels[i], false);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
 		{
 			if (smgrexists(srel, fork))
-				smgrdounlink(srel, fork, false, false);
+				smgrdounlink(srel, fork, false);
 		}
 		smgrclose(srel);
 	}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b88cff2..2c5800d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4434,7 +4434,8 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		/* We know this can't be a temp rel, since it is xlog'd. */
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], false);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4442,7 +4443,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
@@ -4537,7 +4538,8 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 	/* Make sure files supposed to be dropped are dropped */
 	for (i = 0; i < xlrec->nrels; i++)
 	{
-		SMgrRelation srel = smgropen(xlrec->xnodes[i]);
+		/* We know this can't be a temp rel, since it is xlog'd. */
+		SMgrRelation srel = smgropen(xlrec->xnodes[i], false);
 		ForkNumber	fork;
 
 		for (fork = 0; fork <= MAX_FORKNUM; fork++)
@@ -4545,7 +4547,7 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 			if (smgrexists(srel, fork))
 			{
 				XLogDropRelation(xlrec->xnodes[i], fork);
-				smgrdounlink(srel, fork, false, true);
+				smgrdounlink(srel, fork, true);
 			}
 		}
 		smgrclose(srel);
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 9ee2036..4462cfd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -277,8 +277,8 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 
 	Assert(blkno != P_NEW);
 
-	/* Open the relation at smgr level */
-	smgr = smgropen(rnode);
+	/* Open the relation at smgr level.  It's xlog'd, thus not temp. */
+	smgr = smgropen(rnode, false);
 
 	/*
 	 * Create the target file if it doesn't already exist.  This lets us cope
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index ad376a1..a9602e6 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -103,7 +103,7 @@ RelationCreateStorage(RelFileNode rnode, bool istemp)
 	xl_smgr_create xlrec;
 	SMgrRelation srel;
 
-	srel = smgropen(rnode);
+	srel = smgropen(rnode, istemp);
 	smgrcreate(srel, MAIN_FORKNUM, false);
 
 	if (!istemp)
@@ -283,7 +283,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Do the real work */
-	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }
 
 /*
@@ -322,14 +322,11 @@ smgrDoPendingDeletes(bool isCommit)
 				SMgrRelation srel;
 				int			i;
 
-				srel = smgropen(pending->relnode);
+				srel = smgropen(pending->relnode, pending->isTemp);
 				for (i = 0; i <= MAX_FORKNUM; i++)
 				{
 					if (smgrexists(srel, i))
-						smgrdounlink(srel,
-									 i,
-									 pending->isTemp,
-									 false);
+						smgrdounlink(srel, i, false);
 				}
 				smgrclose(srel);
 			}
@@ -456,7 +453,8 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		xl_smgr_create *xlrec = (xl_smgr_create *) XLogRecGetData(record);
 		SMgrRelation reln;
 
-		reln = smgropen(xlrec->rnode);
+		/* We know this can't be a temp rel, since it is xlog'd. */
+		reln = smgropen(xlrec->rnode, false);
 		smgrcreate(reln, MAIN_FORKNUM, true);
 	}
 	else if (info == XLOG_SMGR_TRUNCATE)
@@ -465,7 +463,8 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		SMgrRelation reln;
 		Relation	rel;
 
-		reln = smgropen(xlrec->rnode);
+		/* We know this can't be a temp rel, since it is xlog'd. */
+		reln = smgropen(xlrec->rnode, false);
 
 		/*
 		 * Forcibly create relation if it doesn't exist (which suggests that
@@ -475,7 +474,7 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
 		 */
 		smgrcreate(reln, MAIN_FORKNUM, true);
 
-		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno, false);
+		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
 
 		/* Also tell xlogutils.c about it */
 		XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 25b2807..aea7cc5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6981,7 +6981,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace)
 	newrnode = rel->rd_node;
 	newrnode.relNode = newrelfilenode;
 	newrnode.spcNode = newTableSpace;
-	dstrel = smgropen(newrnode);
+	dstrel = smgropen(newrnode, rel->rd_istemp);
 
 	RelationOpenSmgr(rel);
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index caae936..f69e959 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -260,7 +260,7 @@ ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
 {
 	bool		hit;
 
-	SMgrRelation smgr = smgropen(rnode);
+	SMgrRelation smgr = smgropen(rnode, isTemp);
 
 	return ReadBuffer_common(smgr, isTemp, forkNum, blockNum, mode, strategy,
 							 &hit);
@@ -1838,7 +1838,7 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 
 	/* Find smgr relation for buffer */
 	if (reln == NULL)
-		reln = smgropen(buf->tag.rnode);
+		reln = smgropen(buf->tag.rnode, false);
 
 	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
 										buf->tag.blockNum,
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index c5b6a2c..b036512 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -198,7 +198,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		SMgrRelation oreln;
 
 		/* Find smgr relation for buffer */
-		oreln = smgropen(bufHdr->tag.rnode);
+		oreln = smgropen(bufHdr->tag.rnode, true);
 
 		/* And write... */
 		smgrwrite(oreln,
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 579572f..b43a2ce 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -303,7 +303,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	}
 
 	/* Truncate the unused FSM pages, and send smgr inval message */
-	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks, rel->rd_istemp);
+	smgrtruncate(rel->rd_smgr, FSM_FORKNUM, new_nfsmblocks);
 
 	/*
 	 * We might as well update the local smgr_fsm_nblocks setting.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 4163ca0..63d049b 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -794,8 +794,7 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
  *	mdtruncate() -- Truncate relation to specified number of blocks.
  */
 void
-mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-		   bool isTemp)
+mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	MdfdVec    *v;
 	BlockNumber curnblk;
@@ -839,7 +838,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 						 errmsg("could not truncate file \"%s\": %m",
 								FilePathName(v->mdfd_vfd))));
 
-			if (!isTemp)
+			if (!reln->isTemp)
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			Assert(ov != reln->md_fd[forknum]); /* we never drop the 1st
@@ -864,7 +863,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 					errmsg("could not truncate file \"%s\" to %u blocks: %m",
 						   FilePathName(v->mdfd_vfd),
 						   nblocks)));
-			if (!isTemp)
+			if (!reln->isTemp)
 				register_dirty_segment(reln, forknum, v);
 			v = v->mdfd_chain;
 			ov->mdfd_chain = NULL;
@@ -1051,8 +1050,12 @@ mdsync(void)
 				 * we couldn't safely do that.)  Furthermore, in many cases
 				 * the relation will have been dirtied through this same smgr
 				 * relation, and so we can save a file open/close cycle.
+				 *
+				 * We will never be asked to fsync a temporary relation, so
+				 * it's safe to pass false as the second argument to
+				 * smgropen().
 				 */
-				reln = smgropen(entry->tag.rnode);
+				reln = smgropen(entry->tag.rnode, false);
 
 				/*
 				 * It is possible that the relation has been dropped or
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 3b12cb3..cc72e5d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -57,7 +57,7 @@ typedef struct f_smgr
 							BlockNumber blocknum, char *buffer, bool isTemp);
 	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-										   BlockNumber nblocks, bool isTemp);
+										   BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_pre_ckpt) (void);		/* may be NULL */
 	void		(*smgr_sync) (void);	/* may be NULL */
@@ -131,7 +131,7 @@ smgrshutdown(int code, Datum arg)
  *		This does not attempt to actually open the object.
  */
 SMgrRelation
-smgropen(RelFileNode rnode)
+smgropen(RelFileNode rnode, bool isTemp)
 {
 	SMgrRelation reln;
 	bool		found;
@@ -160,6 +160,7 @@ smgropen(RelFileNode rnode)
 		int			forknum;
 
 		/* hash_search already filled in the lookup key */
+		reln->isTemp = isTemp;
 		reln->smgr_owner = NULL;
 		reln->smgr_targblock = InvalidBlockNumber;
 		reln->smgr_fsm_nblocks = InvalidBlockNumber;
@@ -323,7 +324,7 @@ smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  *		already.
  */
 void
-smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isTemp, bool isRedo)
+smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 {
 	RelFileNode rnode = reln->smgr_rnode;
 	int			which = reln->smgr_which;
@@ -331,7 +332,7 @@ smgrdounlink(SMgrRelation reln, ForkNumber forknum, bool isTemp, bool isRedo)
 	/* Close the fork */
 	(*(smgrsw[which].smgr_close)) (reln, forknum);
 
-	smgr_internal_unlink(rnode, forknum, which, isTemp, isRedo);
+	smgr_internal_unlink(rnode, forknum, which, reln->isTemp, isRedo);
 }
 
 /*
@@ -455,14 +456,13 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
  * The truncation is done immediately, so this can't be rolled back.
  */
 void
-smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
-			 bool isTemp)
+smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 {
 	/*
 	 * Get rid of any buffers for the about-to-be-deleted blocks. bufmgr will
 	 * just drop them without bothering to write the contents.
 	 */
-	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, isTemp, nblocks);
+	DropRelFileNodeBuffers(reln->smgr_rnode, forknum, reln->isTemp, nblocks);
 
 	/*
 	 * Send a shared-inval message to force other backends to close any smgr
@@ -479,8 +479,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks,
 	/*
 	 * Do the truncation.
 	 */
-	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks,
-												 isTemp);
+	(*(smgrsw[reln->smgr_which].smgr_truncate)) (reln, forknum, nblocks);
 }
 
 /*
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index cf248b8..6794c13 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -43,6 +43,9 @@ typedef struct SMgrRelationData
 	/* pointer to owning pointer, or NULL if none */
 	struct SMgrRelationData **smgr_owner;
 
+	/* is this a temporary relation? */
+	bool		isTemp;
+
 	/*
 	 * These next three fields are not actually used or manipulated by smgr,
 	 * except that they are reset to InvalidBlockNumber upon a cache flush
@@ -70,7 +73,7 @@ typedef SMgrRelationData *SMgrRelation;
 
 
 extern void smgrinit(void);
-extern SMgrRelation smgropen(RelFileNode rnode);
+extern SMgrRelation smgropen(RelFileNode rnode, bool isTemp);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
@@ -78,7 +81,7 @@ extern void smgrcloseall(void);
 extern void smgrclosenode(RelFileNode rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, ForkNumber forknum,
-			 bool isTemp, bool isRedo);
+			 bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
 		   BlockNumber blocknum, char *buffer, bool isTemp);
 extern void smgrprefetch(SMgrRelation reln, ForkNumber forknum,
@@ -89,7 +92,7 @@ extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 		  BlockNumber blocknum, char *buffer, bool isTemp);
 extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
-			 BlockNumber nblocks, bool isTemp);
+			 BlockNumber nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpreckpt(void);
 extern void smgrsync(void);
@@ -114,7 +117,7 @@ extern void mdwrite(SMgrRelation reln, ForkNumber forknum,
 		BlockNumber blocknum, char *buffer, bool isTemp);
 extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum);
 extern void mdtruncate(SMgrRelation reln, ForkNumber forknum,
-		   BlockNumber nblocks, bool isTemp);
+		   BlockNumber nblocks);
 extern void mdimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void mdpreckpt(void);
 extern void mdsync(void);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 444e892..da8a46b 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -347,7 +347,8 @@ typedef struct StdRdOptions
 #define RelationOpenSmgr(relation) \
 	do { \
 		if ((relation)->rd_smgr == NULL) \
-			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node)); \
+			smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, \
+				(relation)->rd_istemp)); \
 	} while (0)
 
 /*

#31

simon@2ndQuadrant.com

over 15 years ago

In reply to: Robert Haas (#29)

Re: global temporary tables

On Sun, 2010-04-25 at 06:50 -0400, Robert Haas wrote:

On Sun, Apr 25, 2010 at 3:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote:

There is one major problem, though: assigning a
scratch relfilenode to the temporary table requires generating an OID,
which we currently have no way to allow on the standby.

Why not have an unlogged counter, which resets each system start, using
same datatype as an oid. There's no necessity for the relfilenode to be
an actual oid is there?

That way we could use it on standbys also.

I don't think that quite works, because the standby might assign a
relfilenode number for a global temp table and then the master might
subsequently assign the same relfilenode number to a regular table.
We might be able to make that not matter, but it's far from obvious to
me that there are no gotchas there...

That sounds fairly simple to solve.

All I am saying is please include "working on the standby" as part of
your requirement.

--
Simon Riggs www.2ndQuadrant.com

#32

Jim Nasby

decibel@decibel.org

over 15 years ago

In reply to: Tom Lane (#27)

Re: global temporary tables

On Apr 24, 2010, at 10:02 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Pushing it into the RelFileNode has some advantages in terms of being
able to get at the information from everywhere, but one thing that
makes me think that's probably not a good decision is that we somtimes
WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is
a waste, because if we're WAL-logging, it's zero.

Yeah. I think we also use RelFileNode as a hash tag in places, and
so adding a bool to it would be problematic for a couple of reasons:
possibly uninitialized pad bytes, and uselessly incorporating more bytes
into the hash calculation.

Do we need to hash it that frequently? Do we insert tons of them into WAL? Worrying about those cases smells like premature optimization, but admittedly I don't have enough knowledge here...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#33

robertmhaas@gmail.com

over 15 years ago

In reply to: Simon Riggs (#31)

Re: global temporary tables

On Sun, Apr 25, 2010 at 8:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I don't think that quite works, because the standby might assign a
relfilenode number for a global temp table and then the master might
subsequently assign the same relfilenode number to a regular table.
We might be able to make that not matter, but it's far from obvious to
me that there are no gotchas there...

That sounds fairly simple to solve.

All I am saying is please include "working on the standby" as part of
your requirement.

Well, I think I already basically stated my position on this, but let
me try to be more clear. I'm not promising to work on any portion of
this project AT ALL or to have it done at any particular time. I am
specifically not promising to do the extra work required to make it
work with Hot Standby, although I am also not saying that I won't.
Nobody has offered to fund any portion of this work, so there are no
guarantees, full stop. Even if you could successfully convince a
critical mass of people on this list that the work should not be
committed without adding support for temp tables in Hot Standby mode,
the most likely result of that would be that I would give up and not
pursue this at all, rather than that I would agree to do that in
addition to solving all the problems already discussed. And I don't
think you can even get that far, because I don't think too many people
here are going to say that we shouldn't add global temporary tables
unless we can also make them work with Hot Standby.

In all honesty, I would think that you would be happy about my
possibly implementing a flavor of temporary tables that would be
substantially more feasible to make work with Hot Standby than the
kind we have now, rather than (as you seem to be) complaining that I'm
not solving the entire problem.

...Robert

#34

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Jim Nasby (#32)

Re: global temporary tables

Jim Nasby <decibel@decibel.org> writes:

On Apr 24, 2010, at 10:02 PM, Tom Lane wrote:

Yeah. I think we also use RelFileNode as a hash tag in places, and
so adding a bool to it would be problematic for a couple of reasons:
possibly uninitialized pad bytes, and uselessly incorporating more bytes
into the hash calculation.

Do we need to hash it that frequently?

Every buffer access, for instance.

Do we insert tons of them into WAL?

Yes; the vast majority of WAL records contain one.

regards, tom lane

#35

Josh Berkus

josh@agliodbs.com

over 15 years ago

In reply to: Robert Haas (#1)

Re: global temporary tables

Robert,

(1). What I *think* it is supposed to mean is that the table is a
permanent object which is "globally" visible - that is, it's part of
some non-temp schema like public or $user and it's column definitions
etc. are visible to all backends - and it's not automatically removed
on commit, backend exit, etc. - but the *contents* of the table are
temporary and backend-local, so that each new backend initially sees
it as empty and can then insert, update, and delete data independently
of what any other backend does.

While closer to the standard, the above definition is a lot less useful
than what I believe a lot of people want, which is a table which is
globally visible, but has no durability; that is, it does not get
WAL-logged or recovered on restart. Certainly this latter definition
would be far more useful to support materialized views.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#36

Andrew Dunstan

andrew@dunslane.net

over 15 years ago

In reply to: Josh Berkus (#35)

Re: global temporary tables

Josh Berkus wrote:

Robert,

(1). What I *think* it is supposed to mean is that the table is a
permanent object which is "globally" visible - that is, it's part of
some non-temp schema like public or $user and it's column definitions
etc. are visible to all backends - and it's not automatically removed
on commit, backend exit, etc. - but the *contents* of the table are
temporary and backend-local, so that each new backend initially sees
it as empty and can then insert, update, and delete data independently
of what any other backend does.

While closer to the standard, the above definition is a lot less
useful than what I believe a lot of people want, which is a table
which is globally visible, but has no durability; that is, it does not
get WAL-logged or recovered on restart. Certainly this latter
definition would be far more useful to support materialized views.

These are not mutually exclusive features. What you're asking for has
value, certainly, but it's not a temp table in the standard's terms
(which is a feature that also has value, I believe).

cheers

andrew

#37

http://archives.postgresql.org/pgsql-hackers/2010-04/msg01123.php

robertmhaas@gmail.com

over 15 years ago

In reply to: Josh Berkus (#35)

Re: global temporary tables

On Sun, Apr 25, 2010 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote:

Robert,

(1). What I *think* it is supposed to mean is that the table is a
permanent object which is "globally" visible - that is, it's part of
some non-temp schema like public or $user and it's column definitions
etc. are visible to all backends - and it's not automatically removed
on commit, backend exit, etc. - but the *contents* of the table are
temporary and backend-local, so that each new backend initially sees
it as empty and can then insert, update, and delete data independently
of what any other backend does.

While closer to the standard, the above definition is a lot less useful than
what I believe a lot of people want, which is a table which is globally
visible, but has no durability; that is, it does not get WAL-logged or
recovered on restart. Certainly this latter definition would be far more
useful to support materialized views.

I think it's arguable which one is more useful, but I think a good
deal of the infrastructure can be made to serve both purposes, as I
further expounded upon here.

...Robert

#38

simon@2ndQuadrant.com

over 15 years ago

In reply to: Robert Haas (#33)

Re: global temporary tables

On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote:

And I don't
think you can even get that far, because I don't think too many people
here are going to say that we shouldn't add global temporary tables
unless we can also make them work with Hot Standby.

The policy round here for some time has been that when we implement
things we make them work fully and seamlessly. I don't see why Hot
Standby would be singled out any more than any other feature, say
Windows support or tablespaces should be occasionally ignored.

People need to get used to the new feature set, just as we had to with
HOT, subtransactions, prepared transactions, Gist etc.. That may require
a thwack from various people, but the responsibility lies with the new
feature implementor, not the person supporting existing code.

I fully understand your wish to implement a partial feature with caveats
because I have argued that many times myself. But I've come to realise
that the best way is to build things so they work cleanly across the
board. Other developers can plan projects in the knowledge that they can
build directly on firm foundations, not fill in the cracks. In the end
this comes down to a choice as developers, do we help each other by
doing a full job, or do we leave unexploded bombs for each other through
short-termism? Now I understand this better myself, I act differently
and accept objections if people think a fuller, more complete design is
what is needed. Recent demonstrations of that available, both objecting
and accepting.

Don't see this as an extra task, just see it as one of the many aspects
that will need to be considered when developing it. If you do that it
need not be additional work.

--
Simon Riggs www.2ndQuadrant.com

#39

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Simon Riggs (#38)

Re: global temporary tables

Simon Riggs <simon@2ndQuadrant.com> writes:

On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote:

And I don't
think you can even get that far, because I don't think too many people
here are going to say that we shouldn't add global temporary tables
unless we can also make them work with Hot Standby.

The policy round here for some time has been that when we implement
things we make them work fully and seamlessly. I don't see why Hot
Standby would be singled out any more than any other feature, say
Windows support or tablespaces should be occasionally ignored.

The current definition of Hot Standby is that it's a *read only*
behavior. Not read mostly. What you are proposing is a rather
fundamental change in the behavior of HS, and it doesn't seem to me
that it should be on the head of anybody else to make it work.

IOW: I agree with Robert that this is not an essential part of global
temp tables. If it happens to fall out that it works like that, great,
but it isn't a requirement.

regards, tom lane

#40

simon@2ndQuadrant.com

over 15 years ago

In reply to: Tom Lane (#39)

Re: global temporary tables

On Mon, 2010-04-26 at 15:40 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote:

And I don't
think you can even get that far, because I don't think too many people
here are going to say that we shouldn't add global temporary tables
unless we can also make them work with Hot Standby.

The policy round here for some time has been that when we implement
things we make them work fully and seamlessly. I don't see why Hot
Standby would be singled out any more than any other feature, say
Windows support or tablespaces should be occasionally ignored.

The current definition of Hot Standby is that it's a *read only*
behavior. Not read mostly. What you are proposing is a rather
fundamental change in the behavior of HS, and it doesn't seem to me
that it should be on the head of anybody else to make it work.

That's a dangerous precedent you just set.

--
Simon Riggs www.2ndQuadrant.com

#41

robertmhaas@gmail.com

over 15 years ago

In reply to: Simon Riggs (#38)

Re: global temporary tables

On Mon, Apr 26, 2010 at 3:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote:

And I don't
think you can even get that far, because I don't think too many people
here are going to say that we shouldn't add global temporary tables
unless we can also make them work with Hot Standby.

The policy round here for some time has been that when we implement
things we make them work fully and seamlessly. I don't see why Hot
Standby would be singled out any more than any other feature, say
Windows support or tablespaces should be occasionally ignored.

People need to get used to the new feature set, just as we had to with
HOT, subtransactions, prepared transactions, Gist etc.. That may require
a thwack from various people, but the responsibility lies with the new
feature implementor, not the person supporting existing code.

I fully understand your wish to implement a partial feature with caveats
because I have argued that many times myself. But I've come to realise
that the best way is to build things so they work cleanly across the
board. Other developers can plan projects in the knowledge that they can
build directly on firm foundations, not fill in the cracks. In the end
this comes down to a choice as developers, do we help each other by
doing a full job, or do we leave unexploded bombs for each other through
short-termism? Now I understand this better myself, I act differently
and accept objections if people think a fuller, more complete design is
what is needed. Recent demonstrations of that available, both objecting
and accepting.

Don't see this as an extra task, just see it as one of the many aspects
that will need to be considered when developing it. If you do that it
need not be additional work.

I think you're looking at this the wrong way. If temporary tables
have to work with Hot Standby in order for it to be committable, then
we should never have committed Hot Standby in the first place because
our current flavor of temporary tables doesn't. Was that an oversight
on your part, or a recognition that you can't solve every problem in
one commit?

...Robert

#42