Temporary tables under hot standby

Started by Noah Mischalmost 14 years ago51 messageshackers
Jump to latest
#1Noah Misch
noah@leadboat.com

A key barrier to migrations from trigger-based replication to WAL-based
replication is the lack of temporary tables under hot standby. I'd like to
close that gap; the changes needed will also reduce the master-side cost of
temporary table usage. Here is a high-level design for your advice and
comments. Much of this builds on ideas from past postings noted below.

Our temporary tables are cataloged and filled like permanent tables. This has
the major advantage of making most code operate on tables with minimal regard
for their relpersistence. It also brings disadvantages:

1. High catalog turnover in rapid create/drop workloads. Heavy temporary
table users often need to REINDEX relation-oriented catalogs. Hot standby
cannot assign OIDs or modify system catalogs at all.
2. Consumption of permanent XIDs for DML on the table. This increases COMMIT
cost on the master and is a non-starter under hot standby.
3. autovacuum cannot VACUUM temporary tables, but their relfrozenxid values
delay pg_clog truncation and can trigger a wraparound-prevention shutdown.
4. sinval traffic from every CREATE TEMP TABLE et al.
5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table.
6. We don't automatically drop temporary tables that existed at the point of a
crash, because they look much like permanent tables.

To resolve points 2 and 3, let's change the XID values stored in temporary
tables from regular TransactionId to LocalTransactionId. This entails an lxid
counterpart for clog.c. Functions like GetCurrentTransactionId() and
HeapTupleSatisfiesVisibility() will take a Relation argument to identify the
XID type. One open question is whether to add conditional logic to functions
like HeapTupleSatisfiesMVCC() or to have parallel implementations like
HeapTupleSatisfiesMVCCLocal(). I lean toward the latter, perhaps with the
help of some code generation. I don't think a counterpart for pg_subtrans
will be necessary; the backend knows its own XID tree, and the
TRANSACTION_STATUS_SUB_COMMITTED interlock is superfluous with only one
backend as reader and writer. I'm also thinking the local clog can live
strictly in memory; a session that retains a temporary table across 2B local
transactions can afford 512 MiB of RAM. With this change, VACUUM can ignore
relfrozenxid of temporary tables when calculating a new datfrozenxid. This
change can form an independent patch.

I do not see a clean behind-the-scenes fix for points 1, 4 and 5. We can
resolve those by adding a new variety of temporary table, one coincidentally
matching the SQL standard's notion of a temporary table. The developer will
declare it once, after which all sessions observe it as an initially-empty
table whose contents remain local to the session. Most relation catalog
entries, including all OIDs, are readily sharable among sessions. The
exceptions are relpages, reltuples, relallvisible, relfrozenxid, and
pg_statistic rows. I will handle the pg_class columns by introducing new
backend APIs abstracting them. Those APIs will consult the relcache for
permanent tables and a local-memory hash for temporary tables. For
statistics, add a new catalog pg_temp_statistic, an inheritance child of
pg_statistic and itself one of these new-variety temporary tables.

Past discussions have raised the issue of interaction between commands like
ALTER TABLE and sessions using the new-variety temporary table. As a first
cut, let's keep this simple and have ongoing use of the table block operations
requiring AccessExclusiveLock. Note that you can always just make a new
temporary table with a different name to deploy a change quickly. Implement
this with a heavyweight lock having a novel life cycle. When a session first
takes an ordinary relation lock on the table, acquire the longer-term lock and
schedule it for release on transaction abort. On TRUNCATE, schedule a release
on transaction commit. Of course, also release the lock at session end.

For new-variety temporary tables, change file names from "relfilenode[_fork]"
to "refilenode[_fork].pid.localnode". During crash recovery, delete all files
conforming to that pattern for refilenodes of known temporary tables. This
also lets backends share pg_class.relfilenode. The "localnode" field would
distinguish multiple generations of a table across VACUUM FULL, CLUSTER, and
TRUNCATE. We could build on this strategy to safely resolve point 6 for the
existing kind of temporary table, but that entails enough other details to
probably not mix it into the same patch.

A third patch will permit the following commands in read-only transactions,
where they will throw an error if the subject is not a temporary table:

INSERT
UPDATE
DELETE
COPY ... FROM
TRUNCATE
ANALYZE
VACUUM (including VACUUM FULL)
CLUSTER (without USING clause)
REINDEX

I considered whether to instead separate the set of commands allowed in a
read-only transaction from the set allowed under hot standby. This proposal
is closer to the SQL standard, which explicitly allows INSERT, UPDATE and
DELETE on temporary tables during read-only transactions. Only the first five
commands are essential; support for the rest could wait for follow-on patches.

Concerning everyone's favorite topic, how to name the new type of table, I
liked Tom's proposal[1]http://archives.postgresql.org/message-id/5422.1240936705@sss.pgh.pa.us to make CREATE TEMP TABLE retain current behavior and
have CREATE GLOBAL TEMP TABLE and/or CREATE LOCAL TEMP TABLE request the new
SQL-standard variety. (I'd vote for using CREATE GLOBAL and retaining CREATE
LOCAL for future expansion.) As he mentions, to get there, we'd ideally start
by producing a warning instead of silently accepting GLOBAL as a noise word.
Should we put such a warning into 9.2?

How important is support for VACUUM on these tables under hot standby? The
alternative is to fail when a session retains a temporary table across 2B
local transactions. I do not currently see any challenges sufficient to
motivate not supporting VACUUM, but it might be a useful simplification to
keep in mind. What about ANALYZE support; how important is the ability to
collect statistics on temporary tables? Again, I tentatively expect to
support it regardless of the answer.

Key past threads:
http://archives.postgresql.org/message-id/7903.1050417344@sss.pgh.pa.us
http://archives.postgresql.org/message-id/162867790707011431u71e53543x19e64e5bb160b124@mail.gmail.com
http://archives.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc7806f@mail.gmail.com
http://archives.postgresql.org/message-id/u2o603c8f071004231952i36642ae6u9d6a7eae6eb6ff32@mail.gmail.com
http://archives.postgresql.org/message-id/BANLkTin1Gha0SS77E64jczPfAPn6Oxb8hQ%40mail.gmail.com

Thanks,
nm

[1]: http://archives.postgresql.org/message-id/5422.1240936705@sss.pgh.pa.us

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#1)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 4:55 AM, Noah Misch <noah@leadboat.com> wrote:

1. High catalog turnover in rapid create/drop workloads.  Heavy temporary
  table users often need to REINDEX relation-oriented catalogs.  Hot standby
  cannot assign OIDs or modify system catalogs at all.
4. sinval traffic from every CREATE TEMP TABLE et al.
5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table.

I do not see a clean behind-the-scenes fix for points 1, 4 and 5.  We can
resolve those by adding a new variety of temporary table, one coincidentally
matching the SQL standard's notion of a temporary table.  The developer will
declare it once, after which all sessions observe it as an initially-empty
table whose contents remain local to the session.

The rest of your post is very good and I'm excited. This part doesn't
work for me.

I don't see how introducing a new type of temp table solves this
problem. How would the developer declare this in HS? How would it then
be globally visible without using global OIDs, causing sinval and
using global locks? This feels like a suggestion from somewhere else
grafted onto your proposal. I'm not against introducing a new type of
temp table, I just think it is orthogonal to the OT.

For me, the solutions are
4. Invent a "linval" - an invalidation that always stays local
5. invent a LocalRelationLock that takes AccessExclusiveLock but never
goes to the global lock table.
So those aspects stay completely in local memory.

1. is a little harder. I suggest we have a range of say 16384 OIDs
reserved for use by temporary relations. If that range is used up we
do then move to using real global Ids, though that is code we'll
almost never need, so it would be acceptable to restrict HS sessions
to only ever use 16384 temp tables concurrently in one session.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#1)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 4:55 AM, Noah Misch <noah@leadboat.com> wrote:

A third patch will permit the following commands in read-only transactions,
where they will throw an error if the subject is not a temporary table:

...

VACUUM (including VACUUM FULL)
CLUSTER (without USING clause)
REINDEX

Those commands have very low user visible effect on temp tables, so I
suggest we implement them as no-op commands in HS. When everything
else is done and dusted, they might be worth returning to, but I'd
hope you won't spend much time on those anytime soon.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#4Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Simon Riggs (#2)
Re: Temporary tables under hot standby

2012/4/25 Simon Riggs <simon@2ndquadrant.com>:

On Wed, Apr 25, 2012 at 4:55 AM, Noah Misch <noah@leadboat.com> wrote:

I do not see a clean behind-the-scenes fix for points 1, 4 and 5.  We can
resolve those by adding a new variety of temporary table, one coincidentally
matching the SQL standard's notion of a temporary table.  The developer will
declare it once, after which all sessions observe it as an initially-empty
table whose contents remain local to the session.

[..]

I don't see how introducing a new type of temp table solves this
problem. How would the developer declare this in HS? How would it then
be globally visible without using global OIDs, causing sinval and
using global locks?

The declarative creation of an “standard-like” temporary table only
happens once (it is part of the schema). Using (e.g. putting stuff in
and executing queries on) such tables can happen on the standby
without the master having to know.

Therefore, I don't see the problem. Just schedule issuing the creation
along with any other schema-changes on the master.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Nicolas Barbier (#4)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 9:37 AM, Nicolas Barbier
<nicolas.barbier@gmail.com> wrote:

The declarative creation of an “standard-like” temporary table only
happens once (it is part of the schema). Using (e.g. putting stuff in
and executing queries on) such tables can happen on the standby
without the master having to know.

So you are saying it is OK to not be able to *create* them on HS, just
*use* pre-defined tables?

That's almost useless IMHO.

Applications expect to be able to do this all in the same transaction
on one session
CREATE TEMP TABLE x;
...DML commands...
SELECT ... FROM x;

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#6Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Simon Riggs (#5)
Re: Temporary tables under hot standby

2012/4/25 Simon Riggs <simon@2ndquadrant.com>:

So you are saying it is OK to not be able to *create* them on HS, just
*use* pre-defined tables?

That's almost useless IMHO.

Applications expect to be able to do this all in the same transaction
on one session
CREATE TEMP TABLE x;
 ...DML commands...
SELECT ... FROM x;

That’s not how standard-like temporary tables work, they are supposed
to be declared beforehand. That makes sense if you consider the schema
and the set of database-using applications as one. I assume that
wanting to define applications independently from the database schema
is the reason of existence for the PG-like temporary transactions.

The way standard-like temporary tables work is exactly why I assume
Noah proposes to implement them: because they work nicely with HS.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

#7Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Nicolas Barbier (#6)
Re: Temporary tables under hot standby

2012/4/25 Nicolas Barbier <nicolas.barbier@gmail.com>:

is the reason of existence for the PG-like temporary transactions.

s/transactions/tables/

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Nicolas Barbier (#6)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 11:08 AM, Nicolas Barbier
<nicolas.barbier@gmail.com> wrote:

2012/4/25 Simon Riggs <simon@2ndquadrant.com>:

So you are saying it is OK to not be able to *create* them on HS, just
*use* pre-defined tables?

That's almost useless IMHO.

Applications expect to be able to do this all in the same transaction
on one session
CREATE TEMP TABLE x;
 ...DML commands...
SELECT ... FROM x;

That’s not how standard-like temporary tables work, they are supposed
to be declared beforehand. That makes sense if you consider the schema
and the set of database-using applications as one. I assume that
wanting to define applications independently from the database schema
is the reason of existence for the PG-like temporary transactions.

The way standard-like temporary tables work is exactly why I assume
Noah proposes to implement them: because they work nicely with HS.

Well, following a standard that no other major DBMS has followed is
not great, especially if it leads to a non-useful feature.

Many software products generate CREATE TEMP TABLE statements
dynamically. This design would prevent ALL of them from working, as
well as preventing all current programs from using temp tables in the
currently accepted way, so the whole concept is very regrettably
flawed.

I very much support Noah's work to "make temp tables work on hot
standby", but we must solve the main problem, not just implement "make
a completely new kind of temp table work on hot standby". I have no
objection to "make a new kind of temp table", but that does not solve
the "make temp tables work on hot standby" problem.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#9Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#1)
Re: Temporary tables under hot standby

On Tue, Apr 24, 2012 at 11:55 PM, Noah Misch <noah@leadboat.com> wrote:

A key barrier to migrations from trigger-based replication to WAL-based
replication is the lack of temporary tables under hot standby.  I'd like to
close that gap; the changes needed will also reduce the master-side cost of
temporary table usage.  Here is a high-level design for your advice and
comments.  Much of this builds on ideas from past postings noted below.

Our temporary tables are cataloged and filled like permanent tables.  This has
the major advantage of making most code operate on tables with minimal regard
for their relpersistence.  It also brings disadvantages:

1. High catalog turnover in rapid create/drop workloads.  Heavy temporary
  table users often need to REINDEX relation-oriented catalogs.  Hot standby
  cannot assign OIDs or modify system catalogs at all.
2. Consumption of permanent XIDs for DML on the table.  This increases COMMIT
  cost on the master and is a non-starter under hot standby.
3. autovacuum cannot VACUUM temporary tables, but their relfrozenxid values
  delay pg_clog truncation and can trigger a wraparound-prevention shutdown.
4. sinval traffic from every CREATE TEMP TABLE et al.
5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table.
6. We don't automatically drop temporary tables that existed at the point of a
  crash, because they look much like permanent tables.

#6 is already fixed in 9.1. Temporary tables now have names like
tBACKENDID_RELFILENODE, and are cleaned up during start-up in exactly
the same way you're propose to do it further down.

To resolve points 2 and 3, let's change the XID values stored in temporary
tables from regular TransactionId to LocalTransactionId.  This entails an lxid
counterpart for clog.c.  Functions like GetCurrentTransactionId() and
HeapTupleSatisfiesVisibility() will take a Relation argument to identify the
XID type.  One open question is whether to add conditional logic to functions
like HeapTupleSatisfiesMVCC() or to have parallel implementations like
HeapTupleSatisfiesMVCCLocal().  I lean toward the latter, perhaps with the
help of some code generation.  I don't think a counterpart for pg_subtrans
will be necessary; the backend knows its own XID tree, and the
TRANSACTION_STATUS_SUB_COMMITTED interlock is superfluous with only one
backend as reader and writer.  I'm also thinking the local clog can live
strictly in memory; a session that retains a temporary table across 2B local
transactions can afford 512 MiB of RAM.  With this change, VACUUM can ignore
relfrozenxid of temporary tables when calculating a new datfrozenxid.  This
change can form an independent patch.

Agreed. If you can pull it off, this will be a nice improvement
regardless of what happens with the rest of this, and it makes sense
to do it as a separate patch. I don't yet have a strong opinion on
what to do with the HeapTupleSatisfies* functions, but I suspect
you're correct in thinking that separate functions are better. For
one thing, those functions are very much performance-critical, so
introducing extra branches is something to avoid.

Maybe this is a silly idea, but if you're thinking about creating a
local XID space and a global XID space, it might be a good idea to
also make allowance for an "unlogged" XID space - that is, an XID
space that is global to all backends but need not survive crashes.
This would potentially allow unlogged tables to be used in HS mode. I
would expect that you probably don't want to go as far as actually
trying to make this work as part of your current project, but maybe as
you're structuring the code it would be worth considering the
possibility that we'll eventually want >2 XID spaces, either for this
or other reasons.

I do not see a clean behind-the-scenes fix for points 1, 4 and 5.  We can
resolve those by adding a new variety of temporary table, one coincidentally
matching the SQL standard's notion of a temporary table.  The developer will
declare it once, after which all sessions observe it as an initially-empty
table whose contents remain local to the session.  Most relation catalog
entries, including all OIDs, are readily sharable among sessions.  The
exceptions are relpages, reltuples, relallvisible, relfrozenxid, and
pg_statistic rows.  I will handle the pg_class columns by introducing new
backend APIs abstracting them.  Those APIs will consult the relcache for
permanent tables and a local-memory hash for temporary tables.  For
statistics, add a new catalog pg_temp_statistic, an inheritance child of
pg_statistic and itself one of these new-variety temporary tables.

With respect to problem #5, I've been wondering if we couldn't just
forget about taking AccessExclusiveLock when first creating a table
(temporary or permanent). Unless and until the transaction commits,
nobody can see the catalog entry anyway, so nobody else is going to
attempt to take a conflicting lock. So why bother? Maybe there's
some reason here that's not obvious to me. If not, it might be worth
doing on general principle independent of this project.

Problem #4 is a little stickier, but I think also solvable. Basically
all cross-backend access to temp tables ought to be prohibited anyway,
but it currently isn't, because there's at least one cross-backend
operation that we categorically need to support: DROP. Separating the
global and local XID spaces would help with that, but not really all
that much, since if a session manages to exit without cleaning up
after itself, we'd like someone to be able to do that later (and maybe
more aggressively than what we do now, which is to only clean things
up after 2B txns or so). Nevertheless it's currently possible for
backend A to build a relcache entry for backend B's temporary
relation, and it might be that if we prohibited that and maybe a few
other things we could see our way clear to removing most or all of the
sinval traffic here. Still, it may not really help that much without
a solution to problem #1.

On that note, I had a thought in the pat that it might be possible to
do solve problem #1 by using global temp tables as system catalogs -
that is, for each type of system catalog that relates to table
creation, you'd have a permanent catalog and a global temp catalog.
So if someone wants to create a temporary table of the existing
variety on the standby, you can make all the entries in the
global-temp version of pg_class, pg_attribute, etc. However, this
seems extremely difficult to manage in general - there's a lot of code
churn involved, and also possible temporary -> permanent dependencies;
for example, the temporary table might have a pg_attrdef entry that
needs to depend on a non-temporary pg_proc entry. That's tricky to
solve on the master and even trickier to solve in HS operation. So
I'm inclined to agree with you that it makes more sense to just aim to
support global temp tables in HS mode, and if we want to beat our head
against the brick wall of making regular temp tables work there
eventually, that can be a later project.

I would suggest just not worrying about the statistical stuff for the
first version of the patch. Assume that it's adequate to have one set
of statistics for all copies of the table, both pg_statistic entries
and the stats-related stuff in pg_class (relpages, reltuples,
relallvisible). It's not unreasonable to think that the table will be
used in broadly similiar ways across all backends, so in some cases
this might actually give better performance than what you're proposing
to do. If not, you can solve that problem in a future patch. This
project is hard enough without throwing that problem onto the pile,
and I really think that if you do throw it on the pile you're going to
be adding a lot of complexity and code churn that isn't really
necessary for a v1.

Past discussions have raised the issue of interaction between commands like
ALTER TABLE and sessions using the new-variety temporary table.  As a first
cut, let's keep this simple and have ongoing use of the table block operations
requiring AccessExclusiveLock.  Note that you can always just make a new
temporary table with a different name to deploy a change quickly.  Implement
this with a heavyweight lock having a novel life cycle.  When a session first
takes an ordinary relation lock on the table, acquire the longer-term lock and
schedule it for release on transaction abort.  On TRUNCATE, schedule a release
on transaction commit.  Of course, also release the lock at session end.

I'm not sure I believe this will work, but maybe I'm just not understanding it.

For new-variety temporary tables, change file names from "relfilenode[_fork]"
to "refilenode[_fork].pid.localnode".  During crash recovery, delete all files
conforming to that pattern for refilenodes of known temporary tables.  This
also lets backends share pg_class.relfilenode.  The "localnode" field would
distinguish multiple generations of a table across VACUUM FULL, CLUSTER, and
TRUNCATE.  We could build on this strategy to safely resolve point 6 for the
existing kind of temporary table, but that entails enough other details to
probably not mix it into the same patch.

In lieu of including localnode in the filename, I would suggest that
for a global temp table, we set relfilenode = 0, and let each backend
assign one (and reassign new ones) whenever it feels like it. The
mapping from reloid -> relfilenode can be kept in backend-local
memory, or you can keep a <backend id, rel oid> -> relfilenode mapping
in a separate relation fork. The latter would have the additional
advantage of simplifying cleanup and might also be helpful in
detecting when the table is or is not in use by multiple backends.

How important is support for VACUUM on these tables under hot standby?  The
alternative is to fail when a session retains a temporary table across 2B
local transactions.  I do not currently see any challenges sufficient to
motivate not supporting VACUUM, but it might be a useful simplification to
keep in mind.  What about ANALYZE support; how important is the ability to
collect statistics on temporary tables?  Again, I tentatively expect to
support it regardless of the answer.

I think it's probably pretty important to support VACUUM, because even
ignoring wraparound considerations, not vacuuming tends to cause
performance to suck. I think ANALYZE is less important for the
reasons stated above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#9)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 4:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:

How important is support for VACUUM on these tables under hot standby?  The
alternative is to fail when a session retains a temporary table across 2B
local transactions.  I do not currently see any challenges sufficient to
motivate not supporting VACUUM, but it might be a useful simplification to
keep in mind.  What about ANALYZE support; how important is the ability to
collect statistics on temporary tables?  Again, I tentatively expect to
support it regardless of the answer.

I think it's probably pretty important to support VACUUM, because even
ignoring wraparound considerations, not vacuuming tends to cause
performance to suck.  I think ANALYZE is less important for the
reasons stated above.

ANALYZE is essential for temp tables in many cases... not sure what
the "reasons stated above" were, I can't resolve that reference.

I've never seen VACUUM used on a temp table. Perhaps we need it for
edge cases, but either way ISTM to be low priority. If people find
temp tables restrictive they can just use unlogged tables instead.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#11Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#10)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 12:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Apr 25, 2012 at 4:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:

How important is support for VACUUM on these tables under hot standby?  The
alternative is to fail when a session retains a temporary table across 2B
local transactions.  I do not currently see any challenges sufficient to
motivate not supporting VACUUM, but it might be a useful simplification to
keep in mind.  What about ANALYZE support; how important is the ability to
collect statistics on temporary tables?  Again, I tentatively expect to
support it regardless of the answer.

I think it's probably pretty important to support VACUUM, because even
ignoring wraparound considerations, not vacuuming tends to cause
performance to suck.  I think ANALYZE is less important for the
reasons stated above.

ANALYZE is essential for temp tables in many cases... not sure what
the "reasons stated above" were, I can't resolve that reference.

My theory is that users of a global temp table will have
similar-enough usage patterns that a set of statistics that is good
enough for one user will be good enough for all of them. That might
not be true in all cases, but I think it will simplify things quite a
bit to assume it true for purposes of an initial implementation. And
as I noted, in some cases it might be a clear improvement: right now,
after creating a temp table, you've got to analyze it or you'll just
get the default statistics, which figure to be terrible. Inheriting
the statistics left over from the last guy's analyze figures to be
significantly superior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#11)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 12:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 25, 2012 at 12:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Apr 25, 2012 at 4:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:

How important is support for VACUUM on these tables under hot standby?  The
alternative is to fail when a session retains a temporary table across 2B
local transactions.  I do not currently see any challenges sufficient to
motivate not supporting VACUUM, but it might be a useful simplification to
keep in mind.  What about ANALYZE support; how important is the ability to
collect statistics on temporary tables?  Again, I tentatively expect to
support it regardless of the answer.

I think it's probably pretty important to support VACUUM, because even
ignoring wraparound considerations, not vacuuming tends to cause
performance to suck.  I think ANALYZE is less important for the
reasons stated above.

ANALYZE is essential for temp tables in many cases... not sure what
the "reasons stated above" were, I can't resolve that reference.

My theory is that users of a global temp table will have
similar-enough usage patterns that a set of statistics that is good
enough for one user will be good enough for all of them.  That might
not be true in all cases, but I think it will simplify things quite a
bit to assume it true for purposes of an initial implementation.  And
as I noted, in some cases it might be a clear improvement: right now,
after creating a temp table, you've got to analyze it or you'll just
get the default statistics, which figure to be terrible.  Inheriting
the statistics left over from the last guy's analyze figures to be
significantly superior.

Oh, we're talking about different things, and I'm slightly confused.

Yes, we need to support ANALYZE; what we might not need to support, at
least initially, is every user of a global temp table having their own
SEPARATE copy of the table statistics.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#12)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 5:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Oh, we're talking about different things, and I'm slightly confused.

Yes, we need to support ANALYZE; what we might not need to support, at
least initially, is every user of a global temp table having their own
SEPARATE copy of the table statistics.

Yes, we are. Global Temp Tables won't solve the "Works on HS" problem,
so we'd better decide fairly quickly which use case we are addressing,
and why. ISTM Global Temp Tables is more an Oracle compatibility issue
than a problem PostgreSQL users have.

...I have zero basis for deciding whether what you say about Global
Temp Tables is useful or not.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#14Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#13)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 12:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Apr 25, 2012 at 5:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Oh, we're talking about different things, and I'm slightly confused.

Yes, we need to support ANALYZE; what we might not need to support, at
least initially, is every user of a global temp table having their own
SEPARATE copy of the table statistics.

Yes, we are. Global Temp Tables won't solve the "Works on HS" problem,
so we'd better decide fairly quickly which use case we are addressing,
and why. ISTM Global Temp Tables is more an Oracle compatibility issue
than a problem PostgreSQL users have.

...I have zero basis for deciding whether what you say about Global
Temp Tables is useful or not.

Well, Noah presented a pretty good outline of how to make global temp
tables work under Hot Standby. As Noah already said, making regular
temporary tables work under Hot Standby is far more difficult. I
think he's right. I'd rather see us get global temp tables working
under HS than insist we have to have regular temp tables working under
HS and ultimately end up with nothing. Even getting global temp
tables working under HS is probably going to require an entire
development cycle, maybe two. So raising the bar still higher seems
rather self-defeating to me. Half a loaf is better than none.

In the interest of full disclosure, I freely admit that global
temporary tables would also be a neat Oracle compatibility feature,
and I do work for a company that sells Oracle compatibility products
based on PostgreSQL, so there are surely some reasons for me to like
that, but AFAICT they aren't all *that* heavily used by most Oracle
users either, which is why I haven't been able to justify doing this
project before now. The important point here as I see it is that
tables of any flavor require catalog entries, and creating and
destroying catalog entries on a standby server does not seem
tractable, so if we want to have writable tables of any flavor on Hot
Standby sometime in the next year or two, we should pick a design that
doesn't require that. What Noah has proposed seems to me to be by far
the simplest way of making that happen, so I think his design is
spot-on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#14)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 5:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Yes, we are. Global Temp Tables won't solve the "Works on HS" problem,
so we'd better decide fairly quickly which use case we are addressing,
and why. ISTM Global Temp Tables is more an Oracle compatibility issue
than a problem PostgreSQL users have.

Well, Noah presented a pretty good outline of how to make global temp
tables work under Hot Standby.  As Noah already said, making regular
temporary tables work under Hot Standby is far more difficult.  I
think he's right.  I'd rather see us get global temp tables working
under HS than insist we have to have regular temp tables working under
HS and ultimately end up with nothing.  Even getting global temp
tables working under HS is probably going to require an entire
development cycle, maybe two.  So raising the bar still higher seems
rather self-defeating to me.  Half a loaf is better than none.

...

What Noah has proposed seems to me to be by far
the simplest way of making that happen, so I think his design is
spot-on.

Noah's design is spot-on for Global Temp tables, I agree. I have no
objection at all to an implementation of GTTs.

However, it is a fallacy that this is a good solution for using temp
tables on HS. I think the wish to enhance Oracle compatibility is
making some wishful thinking happen with regard to how useful this is
going to be. We need to spend just as much time considering the
utility of our work as we do spending time on the quality of the
implementation, otherwise its just well-implemented shelfware.

I don't think implementing temp tables on HS is more complex than this
proposal, its just orthogonal. There are some common aspects, such as
making local xids work, but that is only needed for a small fraction
of the normal temp table case. So it appears to me that GTTs as
proposed are actually harder to implement and not a stepping stone in
a linear sense. So you could equally argue that the requirement to
bring GTTs into the picture also risks us getting nothing and that the
"half a loaf" idea means GTTs should be excluded. Certainly, trying to
do two orthogonal tasks at once puts both at risk.

So I don't accept the proposition that "GTTs are a useful
implementation route for temp tables on HS", unless we're talking
about a "scenic route".

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#16Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#15)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 1:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

However, it is a fallacy that this is a good solution for using temp
tables on HS. I think the wish to enhance Oracle compatibility is
making some wishful thinking happen with regard to how useful this is
going to be. We need to spend just as much time considering the
utility of our work as we do spending time on the quality of the
implementation, otherwise its just well-implemented shelfware.

Well, like I say, if you want to use locally-defined temp tables on
HS, you have to somehow solve the problem of catalog entries, and
nothing in your email looks like a proposal for how to do that. I've
come up with one design, which I sketched in my original response, but
it relies on creating some new system catalogs that are themselves
GTTs, and it's also hideously complicated. If you or anyone else can
come up with a better design, great, but so far no one has.

It's no skin off my neck if this project gets done in way that
bypasses the need for GTTs; I just don't have a credible proposal for
how to do that, and Noah stated that he doesn't either.

I do agree that what Noah's proposing to implement is shooting at a
pretty narrow target, but I don't think it's so narrow that we
wouldn't commit it if he's willing to do the work to implement it.
All of the infrastructure that he's proposing to create seems to me to
have plausible other uses, so even if the immediate feature doesn't
bring a lot of benefit there's every reason to suppose that it will
pave the way for further improvements down the line.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#14)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 11:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 25, 2012 at 12:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Apr 25, 2012 at 5:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Oh, we're talking about different things, and I'm slightly confused.

Yes, we need to support ANALYZE; what we might not need to support, at
least initially, is every user of a global temp table having their own
SEPARATE copy of the table statistics.

Yes, we are. Global Temp Tables won't solve the "Works on HS" problem,
so we'd better decide fairly quickly which use case we are addressing,
and why. ISTM Global Temp Tables is more an Oracle compatibility issue
than a problem PostgreSQL users have.

...I have zero basis for deciding whether what you say about Global
Temp Tables is useful or not.

Well, Noah presented a pretty good outline of how to make global temp
tables work under Hot Standby.  As Noah already said, making regular
temporary tables work under Hot Standby is far more difficult.  I
think he's right.  I'd rather see us get global temp tables working
under HS than insist we have to have regular temp tables working under
HS and ultimately end up with nothing.  Even getting global temp
tables working under HS is probably going to require an entire
development cycle, maybe two.  So raising the bar still higher seems
rather self-defeating to me.  Half a loaf is better than none.

In the interest of full disclosure, I freely admit that global
temporary tables would also be a neat Oracle compatibility feature,
and I do work for a company that sells Oracle compatibility products
based on PostgreSQL, so there are surely some reasons for me to like
that, but AFAICT they aren't all *that* heavily used by most Oracle
users either, which is why I haven't been able to justify doing this
project before now.

I don't know how GTT play inside the Oracle stack such that they
aren't super popular, but if they work in the standby they will
quickly become a killer feature. IMNSHO it's annoying but acceptable
to be forced to define them into the permanent schema. Lack of temp
tables on the standby is a popular question/complaint on irc and in
most cases the proposal would satisfactorily address the problem.

merlin

#18Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Merlin Moncure (#17)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 5:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

I don't know how GTT play inside the Oracle stack such that they
aren't super popular, but if they work in the standby they will
quickly become a killer feature.  IMNSHO it's annoying but acceptable
to be forced to define them into the permanent schema.  Lack of temp
tables on the standby is a popular question/complaint on irc and in
most cases the proposal would satisfactorily address the problem.

The problem with using GTT for this is, IMHO, that you need to know
what your temp table will look before hand.

I have seen applications that uses the same name (ie: temp1, t1, tt or
t_temp) for all or almost all temp tables and, of course, all those
have different structures.

I have seen also temp tables created dinamically based in a query
(which has more or less columns based on some criteria).

In any case, this means for being able to use GTT on HS for these
applications, the apps needs to be fixed to ensure all temp tables
have different names through the app, also you need to ensure that all
queries that create temp tables to have a fixed set of columns.

Finally, you will need to modify apps to remove all CREATE TEMP TABLE
because they already exists. And i have not mentioned the problem i
will have if i need different behaviour for ON COMMIT (oh! i just did)

so yes, you can workaround things to make this something usable to fix
the problem of "temp tables in HS" but is not transparent (unless you
come from oracle, most db's uses local temp tables just as postgres
does) and certainly is not an ideal solution... FWIW, no one that i
know will want to do those "fixes" in their app.

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#19Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#5)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 10:10:31AM +0100, Simon Riggs wrote:

So you are saying it is OK to not be able to *create* them on HS, just
*use* pre-defined tables?

I estimated that much to cover a worthy portion of the need, yes.

That's almost useless IMHO.

Based on the range of assessments spanning your "almost useless" to Merlin's
"killer feature", I gather that its utility is exceptionally site-specific.

#20Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#9)
Re: Temporary tables under hot standby

On Wed, Apr 25, 2012 at 11:49:23AM -0400, Robert Haas wrote:

On Tue, Apr 24, 2012 at 11:55 PM, Noah Misch <noah@leadboat.com> wrote:

Our temporary tables are cataloged and filled like permanent tables. ?This has
the major advantage of making most code operate on tables with minimal regard
for their relpersistence. ?It also brings disadvantages:

1. High catalog turnover in rapid create/drop workloads. ?Heavy temporary
? table users often need to REINDEX relation-oriented catalogs. ?Hot standby
? cannot assign OIDs or modify system catalogs at all.
2. Consumption of permanent XIDs for DML on the table. ?This increases COMMIT
? cost on the master and is a non-starter under hot standby.
3. autovacuum cannot VACUUM temporary tables, but their relfrozenxid values
? delay pg_clog truncation and can trigger a wraparound-prevention shutdown.
4. sinval traffic from every CREATE TEMP TABLE et al.
5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table.
6. We don't automatically drop temporary tables that existed at the point of a
? crash, because they look much like permanent tables.

#6 is already fixed in 9.1. Temporary tables now have names like
tBACKENDID_RELFILENODE, and are cleaned up during start-up in exactly
the same way you're propose to do it further down.

Ah, so it is. That simplifies things a bit.

Maybe this is a silly idea, but if you're thinking about creating a
local XID space and a global XID space, it might be a good idea to
also make allowance for an "unlogged" XID space - that is, an XID
space that is global to all backends but need not survive crashes.
This would potentially allow unlogged tables to be used in HS mode. I
would expect that you probably don't want to go as far as actually
trying to make this work as part of your current project, but maybe as
you're structuring the code it would be worth considering the
possibility that we'll eventually want >2 XID spaces, either for this
or other reasons.

Agreed. Plenty of the details would change (located in shared memory,
locking, persisted on clean shutdown, etc.), so I'm not sure how much actual
code could remain in common. If I encounter design decisions where one choice
seems to help cover this other use in the future, I'll keep it in mind.

With respect to problem #5, I've been wondering if we couldn't just
forget about taking AccessExclusiveLock when first creating a table
(temporary or permanent). Unless and until the transaction commits,
nobody can see the catalog entry anyway, so nobody else is going to
attempt to take a conflicting lock. So why bother? Maybe there's
some reason here that's not obvious to me. If not, it might be worth
doing on general principle independent of this project.

Sounds safe, offhand. I do suspect the cost of the lock is peanuts compared
to the cost of inserting catalog entries, though, so I wouldn't anticipate a
measurable improvement from that change in isolation.

On that note, I had a thought in the pat that it might be possible to
do solve problem #1 by using global temp tables as system catalogs -
that is, for each type of system catalog that relates to table
creation, you'd have a permanent catalog and a global temp catalog.
So if someone wants to create a temporary table of the existing
variety on the standby, you can make all the entries in the
global-temp version of pg_class, pg_attribute, etc. However, this
seems extremely difficult to manage in general - there's a lot of code
churn involved, and also possible temporary -> permanent dependencies;
for example, the temporary table might have a pg_attrdef entry that
needs to depend on a non-temporary pg_proc entry. That's tricky to
solve on the master and even trickier to solve in HS operation. So
I'm inclined to agree with you that it makes more sense to just aim to
support global temp tables in HS mode, and if we want to beat our head
against the brick wall of making regular temp tables work there
eventually, that can be a later project.

Agreed. I hadn't thought of that dependencies problem. Interesting.

Past discussions have raised the issue of interaction between commands like
ALTER TABLE and sessions using the new-variety temporary table. ?As a first
cut, let's keep this simple and have ongoing use of the table block operations
requiring AccessExclusiveLock. ?Note that you can always just make a new
temporary table with a different name to deploy a change quickly. ?Implement
this with a heavyweight lock having a novel life cycle. ?When a session first
takes an ordinary relation lock on the table, acquire the longer-term lock and
schedule it for release on transaction abort. ?On TRUNCATE, schedule a release
on transaction commit. ?Of course, also release the lock at session end.

I'm not sure I believe this will work, but maybe I'm just not understanding it.

Did you have a specific doubt? I did gloss over all the details, having not
worked them out yet.

For new-variety temporary tables, change file names from "relfilenode[_fork]"
to "refilenode[_fork].pid.localnode". ?During crash recovery, delete all files
conforming to that pattern for refilenodes of known temporary tables. ?This
also lets backends share pg_class.relfilenode. ?The "localnode" field would
distinguish multiple generations of a table across VACUUM FULL, CLUSTER, and
TRUNCATE. ?We could build on this strategy to safely resolve point 6 for the
existing kind of temporary table, but that entails enough other details to
probably not mix it into the same patch.

In lieu of including localnode in the filename, I would suggest that
for a global temp table, we set relfilenode = 0, and let each backend
assign one (and reassign new ones) whenever it feels like it.

Good call.

Thanks,
nm

#21Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#19)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#20)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#9)
#24Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#16)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#21)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#24)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#26)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#26)
#30Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#25)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#27)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#30)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#34Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#1)
#35Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Simon Riggs (#8)
#36Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jaime Casanova (#18)
#37Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#24)
#38Michael Nolan
htfoot@gmail.com
In reply to: Bruce Momjian (#37)
#39Chris Browne
cbbrowne@acm.org
In reply to: Michael Nolan (#38)
#40Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#37)
#41Josh Berkus
josh@agliodbs.com
In reply to: Michael Nolan (#38)
#42Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#41)
#43Merlin Moncure
mmoncure@gmail.com
In reply to: Simon Riggs (#42)
#44Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#42)
#45Merlin Moncure
mmoncure@gmail.com
In reply to: Noah Misch (#1)
#46Noah Misch
noah@leadboat.com
In reply to: Merlin Moncure (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#34)
#48Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#47)
#49Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#48)
#50Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#47)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#50)