Global temporary tables

Started by Konstantin Knizhnikover 6 years ago57 messageshackers
Jump to latest
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Current Postgres implementation of temporary table causes number of
problems:

1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.
2. Parallel queries: right now usage of temporary tables in query
disables parallel plan.
3. It is not possible to use temporary tables at replica. Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).
4. Temporary tables can not be used in prepared transactions.
5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data
between OS cache and local buffers. But if there are thousands of
backends, each executing queries with temporary tables, then  total
amount of
memory used for temporary buffers can exceed several tens of gigabytes.
6. Connection pooler can not reschedule session which has created
temporary tables to some other backend
because it's data is stored in local buffers.

There were several attempts to address this problems.
For example Alexandr Alekseev has implemented patch which allows to
create fast temporary tables without accessing system catalog:
/messages/by-id/20160301182500.2c81c3dc@fujitsu
Unfortunately this patch was too invasive and rejected by community.

There was also attempt to allow under some condition use temporary
tables in 2PC transactions:
/messages/by-id/m2d0pllvqy.fsf@dimitris-macbook-pro.home
/messages/by-id/3a4b3c88-4fa5-1edb-a878-1ed76fa1c82b@postgrespro.ru
Them were also rejected.

I try to make yet another attempt to address this problems, first of all
1), 2), 5) and 6)
To solve this problems I propose notion of "global temporary" tables,
similar with ones in Oracle.
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

Suggested syntax for creation of global temporary tables:

    create global temp table
or
    create session table

Once been created it can be used by all backends.
Global temporary tables are accessed though shared buffers (to solve
problem 2).
Cleanup of temporary tables data (release of shared buffer and deletion
of relation files) is performed on backend termination.
In case of abnormal server termination, files of global temporary tables
are cleaned-up in the same way as of local temporary tables.

Certainly there are cases were global temporary tables can not be used,
i.e. when application is dynamically constructed name and columns of
temporary table.
Also access to local buffers is more efficient than access to shared
buffers because it doesn't require any synchronization.
But please notice that it is always possible to create old (local)
temporary tables which preserves current behavior.

The problem with replica is still not solved. But shared metadata is
step in this direction.
I am thinking about reimplementation of temporary tables using new table
access method API.
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.
I still not sure if implementing special table access method for
temporary tables is good idea.

Patch for global temporary tables is attached to this mail.
The known limitation is that now it supports only B-Tree indexes.
Any feedback is welcome.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

session_tables.patchtext/x-patch; name=session_tables.patchDownload+338-91
#2Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#1)
Re: Global temporary tables

On Wed, 31 Jul 2019 at 23:05, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

Current Postgres implementation of temporary table causes number of
problems:

1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.

This also upsets logical decoding a little - AFAICS it still has to treat
transactions that use temporary tables as catalog-modifying transactions,
tracking them in its historic catalog snapshots and doing extra cache
flushes etc when decoding them.

This will become even more important as we work to support eager/optimistic
output plugin processing of in-progress transactions. We'd have to switch
snapshots more, and that can get quite expensive so using temp tables could
really hurt performance. Or we'd have to serialize on catalog-changing
transactions, in which case using temp tables would negate the benefits of
optimistic streaming of in-progress transactions.

3. It is not possible to use temporary tables at replica.

For physical replicas, yes.

Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).

Right. Because we cannot modify pg_class, pg_attribute etc, even though we
could reasonably enough write to local-only relfilenodes on a replica if we
didn't have to change WAL-logged catalog tables.

I've seen some hacks suggested around this where we have an unlogged fork
of each of the needed catalog tables, allowing replicas to write temp table
info to them. We'd scan both the logged and unlogged forks when doing
relcache management etc. But there are plenty of ugly issues with this.
We'd have to reserve oid ranges for them which is ugly; to make it BC
friendly those reservations would probably have to take the form of some
kind of placeholder entry in the real pg_class. And it gets ickier from
there. It hardly seems worth it when we should probably just implement
global temp tables instead.

5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd still
be creating a backend-local relfilenode for each backend that actually
writes to the temp table, and I don't see how it'd be useful or practical
to keep those in shared_buffers.

Using local buffers has big advantages too. It saves shared_buffers space
for data where there's actually some possibility of getting cache hits, or
for where we can benefit from lazy/async writeback and write combining. I
wouldn't want to keep temp data there if I had the option.

If you're concerned about the memory use of backend local temp buffers, or
about how we account for and limit those, that's worth looking into. But I
don't think it'd be something that should be affected by global-temp vs
backend-local-temp tables.

Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are

thousands of backends, each executing queries with temporary tables,

then total amount of memory used for temporary buffers can exceed

several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in
shared_buffers will do nothing except deprive shared_buffers of space that
can be used for other more useful things. A server-wide temp buffer would
add IPC and locking overheads and AFAICS little benefit. One of the big
appeals of temp tables is that we don't need any of that.

If you want to improve server-wide temp buffer memory accounting and
management that makes sense. I can see it being useful to have things like
a server-wide DSM/DSA pool of temp buffers that backends borrow from and
return to based on memory pressure on a LRU-ish basis, maybe. But I can
also see how that'd be complex and hard to get right. It'd also be prone to
priority inversion problems where an idle/inactive backend must be woken up
to release memory or release locks, depriving an actively executing backend
of runtime. And it'd be as likely to create inefficiencies with copying and
eviction as solve them since backends could easily land up taking turns
kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on
global temp tables personally.

6. Connection pooler can not reschedule session which has created
temporary tables to some other backend because it's data is stored in local
buffers.

Yeah, if you're using transaction-associative pooling. That's just part of
a more general problem though, there are piles of related issues with temp
tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good
here as the data in them will still be backend-local. If it isn't then you
should just be using unlogged tables.

Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously
lost.

+1 that's what a global temp table should be, and it's IIRC pretty much how
the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems
like we could implement these fairly simply. A new relkind would identify
it as a global temp table and the relfilenode would be 0. Same for indexes
on temp tables. We'd extend the relfilenode mapper to support a
backend-local non-persistent relfilenode map that's used to track temp
table and index relfilenodes. If no relfilenode is defined for the table,
the mapper would allocate one. We already happily create missing
relfilenodes on write so we don't even have to pre-create the actual file.
We'd register the relfilenode as a tempfile and use existing tempfile
cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

Global temporary tables are accessed though shared buffers (to solve

problem 2).

I'm far from convinced of the wisdom or necessity of that, but I haven't
spent as much time digging into this problem as you have.

The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow the
creation of global temp tables to specify reloptions, and you can add it as
a reloption later. You can't actually eliminate visibility checks anyway
because they're still MVCC heaps. Savepoints can create invisible tuples
even if you're using temp tables that are cleared on commit, and of course
so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in
practice.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#3Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#2)
Re: Global temporary tables

On 01.08.2019 6:10, Craig Ringer wrote:

3. It is not possible to use temporary tables at replica.

For physical replicas, yes.

Yes, definitely logical replicas (for example our PgPro-EE multimaster
based on logical replication) do not suffer from this problem.
But in case of multimaster we have another problem related with
temporary tables: we have to use 2PC for each transaction and using
temporary tables in prepared transaction is now prohibited.
This was the motivation of the patch proposed by Stas Kelvich which
allows to use temporary tables in prepared transactions under some
conditions.

 5. Inefficient memory usage and possible memory overflow: each backend

maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd
still be creating a backend-local relfilenode for each backend that
actually writes to the temp table, and I don't see how it'd be useful
or practical to keep those in shared_buffers.

Yes, my implementation of global temp tables is using shared buffers.
It was not strictly needed as far as data is local. It is possible to
have shared metadata and private data accessed through local buffers.
But I have done it for three reasons:
1, Make it possible to use parallel plans for temp tables.
2. Eliminate memory overflow problem.
3. Make in possible to reschedule session to other backens (connection
pooler).

Using local buffers has big advantages too. It saves shared_buffers
space for data where there's actually some possibility of getting
cache hits, or for where we can benefit from lazy/async writeback and
write combining. I wouldn't want to keep temp data there if I had the
option.

Definitely local buffers have some advantages:
- do not require synchronization
- avoid flushing data from shared buffers

But global temp tables are not excluding use of original (local) temp
tables.
So you will have a choice: either to use local temp tables which can be
easily created on demand and accessed through local buffers,
either create global temp tables, which eliminate catalog bloating,
allow parallel queries and which data is  controlled by the same cache
replacement discipline as for normal tables...

Default size of temporary buffers is 8Mb. It seems to be too small
for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are

thousands of backends, each executing queries with temporary tables,

then  total amount of memory used for temporary buffers can exceed

several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in
shared_buffers will do nothing except deprive shared_buffers of space
that can be used for other more useful things. A server-wide temp
buffer would add IPC and locking overheads and AFAICS little benefit.
One of the big appeals of temp tables is that we don't need any of that.

I do not think that parallel execution and efficient connection pooling
are "little benefit".

If you want to improve server-wide temp buffer memory accounting and
management that makes sense. I can see it being useful to have things
like a server-wide DSM/DSA pool of temp buffers that backends borrow
from and return to based on memory pressure on a LRU-ish basis, maybe.
But I can also see how that'd be complex and hard to get right. It'd
also be prone to priority inversion problems where an idle/inactive
backend must be woken up to release memory or release locks, depriving
an actively executing backend of runtime. And it'd be as likely to
create inefficiencies with copying and eviction as solve them since
backends could easily land up taking turns kicking each other out of
memory and re-reading their own data.

I don't think this is something that should be tackled as part of work
on global temp tables personally.

My assumptions are the following: temporary tables are mostly used in
OLAP queries. And OLAP workload  means that there are few concurrent
queries which are working with large datasets.
So size of produced temporary tables can be quite big. For OLAP it seems
to be very important to be able to use parallel query execution and use
the same cache eviction rule both for persistent and temp tables
(otherwise you either cause swapping, either extra copying of data
between OS and Postgres caches).

6. Connection pooler can not reschedule session which has created
temporary tables to some other backend because it's data is stored
in local buffers.

Yeah, if you're using transaction-associative pooling. That's just
part of a more general problem though, there are piles of related
issues with temp tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of
good here as the data in them will still be backend-local. If it isn't
then you should just be using unlogged tables.

You can not use the same unlogged table to save intermediate query
results in two parallel sessions.

Definition of this table (metadata) is shared by all backends but
data
is private to the backend. After session termination data is
obviously lost.

+1 that's what a global temp table should be, and it's IIRC pretty
much how the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it
seems like we could implement these fairly simply. A new relkind would
identify it as a global temp table and the relfilenode would be 0.
Same for indexes on temp tables. We'd extend the relfilenode mapper to
support a backend-local non-persistent relfilenode map that's used to
track temp table and index relfilenodes. If no relfilenode is defined
for the table, the mapper would allocate one. We already happily
create missing relfilenodes on write so we don't even have to
pre-create the actual file. We'd register the relfilenode as a
tempfile and use existing tempfile cleanup mechanisms, and we'd use
the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

As I already wrote, I tried to kill two bird with one stone: eliminate
catalog bloating and allow access to temp tables from multiple backends
(to be able to perform parallel queries and connection pooling).
This is why I have to use shared buffers for global temp tables.
May be it was not so good idea. But it was one of my primary intention
of publishing this patch to know opinion of other people.
In PG-Pro some of my colleagues think  that the most critical problem is
inability to use temporary tables at replica.
Other think that it is not a problem at all if you are using logical
replication.
From my point of view the most critical problem is inability to use
parallel plans for temporary tables.
But looks like you don't think so.

I see three different activities related with temporary tables:
1. Shared metadata
2. Shared buffers
3. Alternative concurrency control & reducing tuple header size
(specialized table access method for temporary tables)

In my proposal I combined 1 and 2, leaving 3 for next step.
I will be interested to know other suggestions.

One more thing - 1 and 2 are really independent: you can share metadata
without sharing buffers.
But introducing yet another kind of temporary tables seems to be really
overkill:
- local temp tables (private namespace and lcoal buffers)
- tables with shared metadata but local bufferes
- tables with shared metadata and bufferes

The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow
the creation of global temp tables to specify reloptions, and you can
add it as a reloption later. You can't actually eliminate visibility
checks anyway because they're still MVCC heaps.

Sorry?
I mean elimination of MVCC overhead (visibility checks) for temp tables
only.
I am not sure that we can really fully eliminate it if we support use of
temp tables in prepared transactions and autonomous transactions (yet
another awful feature we have in PgPro-EE).
Also looks like we need to have some analogue of CID to be able to
correctly executed queries like "insert into T (select from T ...)"
where T is global temp table.
I didn't think much about it, but I really considering new table access
method API for reducing per-tuple storage overhead for temporary and
append-only tables.

Savepoints can create invisible tuples even if you're using temp
tables that are cleared on commit, and of course so can DELETEs or
UPDATEs. So I'm not sure how much use it'd really be in practice.

Yehh, subtransactions can be also a problem for eliminating xmin/xmax
for temp tables. Thanks for noticing it.

I noticed that I have not patched some extension - fixed and rebased
version of the patch is attached.
Also you can find this version in our github repository:
https://github.com/postgrespro/postgresql.builtin_pool.git
branch global_temp.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

session_tables-1.patchtext/x-patch; name=session_tables-1.patchDownload+344-97
#4Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#3)
Re: Global temporary tables

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

session_tables-2.patchtext/x-patch; name=session_tables-2.patchDownload+786-120
#5Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#4)
Re: Global temporary tables

On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling benefits. Are you
assuming the presence of some other extension in your extended version of
PostgreSQL ? In community PostgreSQL a temp table's contents in one backend
will not be visible in another backend. So if your connection pooler in
transaction pooling mode runs txn 1 on backend 42 and it populates temp
table X, then the pooler runs the same app session's txn 2 on backend 45,
the contents of temp table X are not visible anymore.

Can you explain? Because AFAICS so long as temp table contents are
backend-private there's absolutely no point ever using shared buffers for
their contents.

Perhaps you mean that in a connection pooling case, each backend may land
up filling up temp buffers with contents from *multiple different temp
tables*? If so, sure, I get that, but the answer there seems to be to
improve eviction and memory accounting, not make backends waste precious
shared_buffers space on non-shareable data.

Anyhow, I strongly suggest you simplify the feature to add the basic global
temp table feature so the need to change pg_class, pg_attribute etc to use
temp tables is removed, but separate changes to temp table memory handling
etc into a follow-up patch. That'll make it smaller and easier to review
and merge too. The two changes are IMO logically quite separate anyway.

Come to think of it, I think connection poolers might benefit from an
extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts
temp table buffers from memory *without* dropping the temp tables. If
they're currently in-memory tuplestores they'd be written out and evicted.
That way a connection pooler could "clean" the backend, at the cost of some
probably pretty cheap buffered writes to the system buffer cache. The
kernel might not even bother to write out the buffercache and it won't be
forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL
so such evictions could be pretty cheap - and if not under lots of memory
pressure the backend could often read the temp table back in from system
buffer cache without disk I/O.

That's my suggestion for how to solve your pooler problem, assuming I've
understood it correctly.

Along these lines I suggest adding the following to DISCARD at some point,
obviously not as part of your patch:

* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS

where obviously DISCARD SHARED_BUFFERS would be superuser-only and evict
only clean buffers.

(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST,
OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for
plugins in future).

Thoughts?

Would DISCARD TEMP_BUFFERS meet your needs?

#6Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#5)
Re: Global temporary tables

On 08.08.2019 5:40, Craig Ringer wrote:

On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling benefits. Are
you assuming the presence of some other extension in your extended 
version of PostgreSQL ? In community PostgreSQL a temp table's
contents in one backend will not be visible in another backend. So if
your connection pooler in transaction pooling mode runs txn 1 on
backend 42 and it populates temp table X, then the pooler runs the
same app session's txn 2 on backend 45, the contents of temp table X
are not visible anymore.

Certainly here I mean built-in connection pooler which is not currently
present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067

Can you explain? Because AFAICS so long as temp table contents are
backend-private there's absolutely no point ever using shared buffers
for their contents.

Sure, there is no such problem with temporary tables now.
There is another problem: you can not use temporary table with any
existed connection poolers (pgbouncer,...) with pooling level other than
session unless temporary table is used inside one transaction.
One of the advantages of built-in connection pooler is that it can
provide session semantic (GUCs, prepared statement, temporary
tables,...) with limited number of backends (smaller than number of
sessions).

In PgPRO-EE this problem was solved by binding session to backend. I.e.
one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of such
solution is obvious: one long living transaction can block transactions
of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious
solutions of the problem. But the main show stopper for it is temporary
tables.
This is why  I consider moving temporary tables to shared buffers as
very important step.

In vanilla version of built-in connection pooler situation is slightly
different.
Right now if client is using temporary tables without "ON COMMIT DROP"
clause, backend is marked as "tainted" and is pinned for this session.
So it is actually excluded from connection pool and servers only this
session. Once again - if I will be able to access temporary table data
from other backend, there will be no need to mark backend as tainted in
this case.
Certainly it also requires shared metadata. And here we come to the
concept of global temp tables (if we forget for a moment that global
temp tables were "invented" long time ago by Oracle and many other DBMSes:)

Perhaps you mean that in a connection pooling case, each backend may
land up filling up temp buffers with contents from *multiple different
temp tables*? If so, sure, I get that, but the answer there seems to
be to improve eviction and memory accounting, not make backends waste
precious shared_buffers space on non-shareable data.

Anyhow, I strongly suggest you simplify the feature to add the basic
global temp table feature so the need to change pg_class, pg_attribute
etc to use temp tables is removed, but separate changes to temp table
memory handling etc into a follow-up patch. That'll make it smaller
and easier to review and merge too. The two changes are IMO logically
quite separate anyway.

I agree that them are separate.
But even if we forget about built-in connection pooler, don't you think
that possibility to use parallel query plans for temporary tables is
itself strong enough motivation to access global temp table through
shared buffers
(while still supporting private page pool for local temp tables). So
both approaches (shared vs. private buffers) have their pros and
contras. This is why it seems to be reasonable to support both of them
and let user to make choice most suitable for concrete application.
Certainly it is possible to provide "global shared temp tables" and
"global private temp tables". But IMHO it is overkill.

Come to think of it, I think connection poolers might benefit from an
extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which
evicts temp table buffers from memory *without* dropping the temp
tables. If they're currently in-memory tuplestores they'd be written
out and evicted. That way a connection pooler could "clean" the
backend, at the cost of some probably pretty cheap buffered writes to
the system buffer cache. The kernel might not even bother to write out
the buffercache and it won't be forced to do so by fsync, checkpoints,
etc, nor will the writes go via WAL so such evictions could be pretty
cheap - and if not under lots of memory pressure the backend could
often read the temp table back in from system buffer cache without
disk I/O.

Yes,  this is one of th possible solutions for session migration. But
frankly speaking flushing local buffers on each session reschedule seems
to be not so good solution. Even if OS file cache is large enough and
flushed buffers are still  present in memory (but them will be written
to the disk in this case even if data of temp table is not intended to
be persisted).

That's my suggestion for how to solve your pooler problem, assuming
I've understood it correctly.

Along these lines I suggest adding the following to DISCARD at some
point, obviously not as part of your patch:

* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS

where obviously DISCARD SHARED_BUFFERS would be superuser-only and
evict only clean buffers.

(Also, if we extend DISCARD lets also it to be written as DISCARD
(LIST, OF, THINGS, TO, DISCARD) so that we can make the syntax
extensible for plugins in future).

Thoughts?

Would DISCARD TEMP_BUFFERS meet your needs?

Actually I have already implemented DropLocalBuffers function (three
line of code:)

void
DropLocalBuffers(void)
{
     RelFileNode rnode;
     rnode.relNode = InvalidOid; /* drop all local buffers */
     DropRelFileNodeAllLocalBuffers(rnode);
}

for yet another Postgres extension which is not yet included even in
PgPRO-EE - SnapFS: support of database snapshots.
I do not think that we need such command at user level (i.e. have
correspondent SQL command).
But, as I already wrote above, I do not consider flushing all buffers on
session reschedule as acceptable solution.
And moreover, just flushing buffers is not enough. There is still some
smgr stuff associated with this relation which is local to the backend.
We in any case has to make some changes to be able to access temporary
data from other backend even if data is flushed to the file system.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#6)
Re: Global temporary tables

On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

On 08.08.2019 5:40, Craig Ringer wrote:

On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling benefits. Are you
assuming the presence of some other extension in your extended version of
PostgreSQL ? In community PostgreSQL a temp table's contents in one backend
will not be visible in another backend. So if your connection pooler in
transaction pooling mode runs txn 1 on backend 42 and it populates temp
table X, then the pooler runs the same app session's txn 2 on backend 45,
the contents of temp table X are not visible anymore.

Certainly here I mean built-in connection pooler which is not currently
present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067

OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other
functionality you want/propose is present in core or will be present in
core. That's far from given. My suggestion is to split it up so that the
parts can be reviewed and committed separately.

In PgPRO-EE this problem was solved by binding session to backend. I.e.
one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of such
solution is obvious: one long living transaction can block transactions of
all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious
solutions of the problem. But the main show stopper for it is temporary
tables.
This is why I consider moving temporary tables to shared buffers as very
important step.

I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly distinct
functional changes in it, and you should separate them out.

* Introduce global temp tables, a new relkind that works like a temp table
but doesn't require catalog changes. Uses per-backend relfilenode and
cleanup like existing temp tables. You could extend the relmapper to handle
the mapping of relation oid to per-backend relfilenode.

* Associate global temp tables with session state and manage them in
shared_buffers so they can work with the in-core connection pooler (if
committed)

Historically we've had a few efforts to get in-core connection pooling that
haven't gone anywhere. Without your pooler patch the changes you make to
use shared_buffers etc are going to be unhelpful at best, if not actively
harmful to performance, and will add unnecessary complexity. So I think
there's a logical series of patches here:

* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?

But even if we forget about built-in connection pooler, don't you think
that possibility to use parallel query plans for temporary tables is itself
strong enough motivation to access global temp table through shared buffers?

I can see a way to share temp tables across parallel query backends being
very useful for DW/OLAP workloads, yes. But I don't know if putting them in
shared_buffers is the right answer for that. We have DSM/DSA, we have
shm_mq, various options for making temp buffers share-able with parallel
worker backends.

I'm suggesting that you not tie the whole (very useful) global temp tables
feature to this, but instead split it up into logical units that can be
understood, reviewed and committed separately.

I would gladly participate in review.

Would DISCARD TEMP_BUFFERS meet your needs?

Actually I have already implemented DropLocalBuffers function (three line
of code:)

[...]

I do not think that we need such command at user level (i.e. have

correspondent SQL command).

I'd be very happy to have it personally, but don't think it needs to be
tied in with your patch set here. Maybe I can cook up a patch soon.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#8Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#7)
Re: Global temporary tables

On 09.08.2019 8:34, Craig Ringer wrote:

On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

On 08.08.2019 5:40, Craig Ringer wrote:

On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling
benefits. Are you assuming the presence of some other extension
in your extended  version of PostgreSQL ? In community PostgreSQL
a temp table's contents in one backend will not be visible in
another backend. So if your connection pooler in transaction
pooling mode runs txn 1 on backend 42 and it populates temp table
X, then the pooler runs the same app session's txn 2 on backend
45, the contents of temp table X are not visible anymore.

Certainly here I mean built-in connection pooler which is not
currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at
commitfest:
https://commitfest.postgresql.org/24/2067

OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other
functionality you want/propose is present in core or will be present
in core. That's far from given. My suggestion is to split it up so
that the parts can be reviewed and committed separately.

In PgPRO-EE this problem was solved by binding session to backend.
I.e. one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of
such solution is obvious: one long living transaction can block
transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the
obvious solutions of the problem. But the main show stopper for it
is temporary tables.
This is why  I consider moving temporary tables to shared buffers
as very important step.

I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly
distinct functional changes in it, and you should separate them out.

* Introduce global temp tables, a new relkind that works like a temp
table but doesn't require catalog changes. Uses per-backend
relfilenode and cleanup like existing temp tables. You could extend
the relmapper to handle the mapping of relation oid to per-backend
relfilenode.

* Associate global temp tables with session state and manage them in
shared_buffers so they can work with the in-core connection pooler (if
committed)

Historically we've had a few efforts to get in-core connection pooling
that haven't gone anywhere. Without your pooler patch the changes you
make to use shared_buffers etc are going to be unhelpful at best, if
not actively harmful to performance, and will add unnecessary
complexity. So I think there's a logical series of patches here:

* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?

But even if we forget about built-in connection pooler, don't you
think that possibility to use parallel query plans for temporary
tables is itself strong enough motivation to access global temp
table through shared buffers?

I can see a way to share temp tables across parallel query backends
being very useful for DW/OLAP workloads, yes. But I don't know if
putting them in shared_buffers is the right answer for that. We have
DSM/DSA, we have shm_mq, various options for making temp buffers
share-able with parallel worker backends.

I'm suggesting that you not tie the whole (very useful) global temp
tables feature to this, but instead split it up into logical units
that can be understood, reviewed and committed separately.

I would gladly participate in review.

Ok, here it is: global_private_temp-1.patch
Also I have attached updated version of the global temp tables with
shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing
BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

global_shared_temp-1.patchtext/x-patch; name=global_shared_temp-1.patchDownload+1081-121
global_private_temp-1.patchtext/x-patch; name=global_private_temp-1.patchDownload+990-48
#9Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#8)
Re: Global temporary tables

On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

Ok, here it is: global_private_temp-1.patch

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on
physical read replicas, though I know there are plenty of other barriers
there (not least of which being temp tables using persistent txns not
vtxids)

Does it have a CF entry?

Also I have attached updated version of the global temp tables with shared
buffers - global_shared_temp-1.patch

Nice to see that split out. In addition to giving the first patch more hope
of being committed this time around, it'll help with readability and
testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state
abstraction you have implemented. I think it's really important for high
client count OLTP workloads, working with the endless collection of ORMs
out there, etc. So I'm all in favour of it in principle so long as it can
be made to work reliably with limited performance impact on existing
workloads and without making life lots harder when adding new core
functionality, for extension authors etc. The same goes for built-in
pooling. I think PostgreSQL has needed some sort of separation of
"connection", "backend", "session" and "executor" for a long time and I'm
glad to see you working on it.

With that said: How do you intend to address the likelihood that this will
cause performance regressions for existing workloads that use temp tables
*without* relying on your session state and connection pooler? Consider
workloads that use temp tables for mid-long txns where txn pooling is
unimportant, where they also do plenty of read and write activity on
persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections
that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent
table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed
window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data
for its medium-length transactions
* transactions also write to some persistent relations, say to record their
summarised results

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp
tables
* the temp table data in shared_buffers may put pressure on shared_buffers
space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers
space for dirty buffers, forcing writes of persistent tables out earlier
therefore reducing write-combining opportunities;

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#10Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#9)
Re: Global temporary tables

On 10.08.2019 5:12, Craig Ringer wrote:

On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

Ok, here it is: global_private_temp-1.patch

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on
physical read replicas, though I know there are plenty of other
barriers there (not least of which being temp tables using persistent
txns not vtxids)

Does it have a CF entry?

https://commitfest.postgresql.org/24/2233/

Also I have attached updated version of the global temp tables
with shared buffers - global_shared_temp-1.patch

Nice to see that split out. In addition to giving the first patch more
hope of being committed this time around, it'll help with readability
and testability too.

To be clear, I have long wanted to see PostgreSQL have the "session"
state abstraction you have implemented. I think it's really important
for high client count OLTP workloads, working with the endless
collection of ORMs out there, etc. So I'm all in favour of it in
principle so long as it can be made to work reliably with limited
performance impact on existing workloads and without making life lots
harder when adding new core functionality, for extension authors etc.
The same goes for built-in pooling. I think PostgreSQL has needed some
sort of separation of "connection", "backend", "session" and
"executor" for a long time and I'm glad to see you working on it.

With that said: How do you intend to address the likelihood that this
will cause performance regressions for existing workloads that use
temp tables *without* relying on your session state and connection
pooler? Consider workloads that use temp tables for mid-long txns
where txn pooling is unimportant, where they also do plenty of read
and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level
connections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large
persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed
window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate
data for its medium-length transactions
* transactions also write to some persistent relations, say to record
their summarised results

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of
temp tables
* the temp table data in shared_buffers may put pressure on
shared_buffers space, cached pages for persistent tables all sessions
are sharing;
* the temp table data in shared_buffers may put pressure on
shared_buffers space for dirty buffers, forcing writes of persistent
tables out earlier therefore reducing write-combining opportunities;

I agree that access to local buffers is cheaper than to shared buffers
because there is no lock overhead.
And the fact that access to local tables can not affect cached data of
persistent tables is also important.
But most of Postgres tables are still normal (persistent) tables access
through shared buffers.
And huge amount of efforts were made to make this access as efficient as
possible (use clock algorithm which doesn't require global lock,
atomic operations,...). Also using the same replacement discipline for
all tables at some workloads may be also preferable.
So it is not so obvious to me that in the described scenario local
buffer cache for temporary table really will provide significant advantages.
It will be interesting to perform some benchmarking - I am going to do it.

What I have observed right now is that in type scenario: dumping results
of huge query to temporary table with subsequent traverse of this table
old (local) temporary tables provide better performance (may be because
of small size of local buffer cache and different eviction policy).
But subsequent accesses to global shared table are faster (because it
completely fits in large shared buffer cache).

There is one more problem with global temporary tables for which I do
not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need
different query plans.
Right now if you perform "analyze table" in one backend, then it will
affect plans in all backends.
It can be considered not as bug, but as feature if we assume that
distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#10)
Re: Global temporary tables

Hi

There is one more problem with global temporary tables for which I do not
know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different
query plans.
Right now if you perform "analyze table" in one backend, then it will
affect plans in all backends.
It can be considered not as bug, but as feature if we assume that
distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it years.

I have a experience with my customers so 99% of usage temp tables is
without statistics - just with good information only about rows. Only few
customers know so manual ANALYZE is necessary for temp tables (when it is
really necessary).

Sharing meta data about global temporary tables can real problem - probably
not about statistics, but surely about number of pages and number of rows.

There are two requirements:

a) we need some special meta data for any instance (per session) of global
temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue
bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal
tables - we can use global temp tables for metadata like classic persistent
tables are used for metadata of classic persistent tables. Next syscache
can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand
well) is good enough for develop stage, but It is hard to expect so it can
work generally in production environment.

Regards

p.s. I am very happy so you are working on this topic. It is interesting
and important problem.

Pavel

Show quoted text
#12Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#11)
Re: Global temporary tables

Hi,

On 11.08.2019 10:14, Pavel Stehule wrote:

Hi

There is one more problem with global temporary tables for which I
do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need
different query plans.
Right now if you perform "analyze table" in one backend, then it
will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that
distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it
years.

I have a experience with my customers so 99% of usage temp tables is
without statistics - just with good information only about rows. Only
few customers know so manual ANALYZE is necessary for temp tables
(when it is really necessary).

Sharing meta data about global temporary tables can real problem -
probably not about statistics, but surely about number of pages and
number of rows.

But Postgres is not storing this information now anywhere else except
statistic, isn't it?
There was proposal to cache relation size,  but it is not implemented
yet. If such cache exists, then we can use it to store local information
about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then
them will not be faced with this problem, right?

There are two requirements:

a) we need some special meta data for any instance (per session) of
global temporary table (row, pages, statistics, maybe multicolumn
statistics, ...)

b) we would not to use persistent global catalogue (against catalogue
bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to
normal tables - we can use global temp tables for metadata like
classic persistent tables are used for metadata of classic persistent
tables. Next syscache can be enhanced to work with union of two system
tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand
well) is good enough for develop stage, but It is hard to expect so it
can work generally in production environment.

I think that it not possible to assume that temporary data will aways
fir in memory.
So 1) seems to be not acceptable solution.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#12)
Re: Global temporary tables

po 12. 8. 2019 v 18:19 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

Hi,

On 11.08.2019 10:14, Pavel Stehule wrote:

Hi

There is one more problem with global temporary tables for which I do not
know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need
different query plans.
Right now if you perform "analyze table" in one backend, then it will
affect plans in all backends.
It can be considered not as bug, but as feature if we assume that
distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it
years.

I have a experience with my customers so 99% of usage temp tables is
without statistics - just with good information only about rows. Only few
customers know so manual ANALYZE is necessary for temp tables (when it is
really necessary).

Sharing meta data about global temporary tables can real problem -
probably not about statistics, but surely about number of pages and number
of rows.

But Postgres is not storing this information now anywhere else except
statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

There was proposal to cache relation size, but it is not implemented yet.

If such cache exists, then we can use it to store local information about
global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them
will not be faced with this problem, right?

they use default statistics based on relpages. But for 1% of applications
statistics are critical - almost always for OLAP applications.

There are two requirements:

a) we need some special meta data for any instance (per session) of global
temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue
bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal
tables - we can use global temp tables for metadata like classic persistent
tables are used for metadata of classic persistent tables. Next syscache
can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand
well) is good enough for develop stage, but It is hard to expect so it can
work generally in production environment.

I think that it not possible to assume that temporary data will aways fir
in memory.
So 1) seems to be not acceptable solution.

I spoke only about metadata. Data should be stored in temp buffers (and
possibly in temp files)

Pavel

Show quoted text

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#14Craig Ringer
craig@2ndquadrant.com
In reply to: Pavel Stehule (#13)
Re: Global temporary tables

On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But Postgres is not storing this information now anywhere else except

statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table
impl handling that right now, since you won't be changing the pg_class row?
AFAICS relpages doesn't need to be up to date (and reltuples certainly
doesn't) so presumably you're just leaving them as zero?

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp
table? Is it just disallowed?

I'll need to check, but I wonder if periodically updating those fields in
pg_class impacts logical decoding too. Logical decoding must treat
transactions with catalog changes as special cases where it creates custom
snapshots and does other expensive additional work.
(See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its
callsites). We don't actually need to know relpages and reltuples during
logical decoding. It can probably ignore relfrozenxid and relminmxid
changes too, maybe even pg_statistic changes though I'd be less confident
about that one.

At some point I need to patch in a bunch of extra tracepoints and do some
perf tracing to see how often we do potentially unnecessary snapshot
related work in logical decoding.

There was proposal to cache relation size, but it is not implemented yet.

If such cache exists, then we can use it to store local information about
global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them
will not be faced with this problem, right?

they use default statistics based on relpages. But for 1% of applications
statistics are critical - almost always for OLAP applications.

Agreed. It's actually quite a common solution to user problem reports /
support queries about temp table performance: "Run ANALYZE. Consider
creating indexes too."

Which reminds me - if global temp tables do get added, it'll further
increase the desirability of exposing a feature to let users
disable+invalidate and then later reindex+enable indexes without icky
catalog hacking. So they can disable indexes for their local copy, load
data, re-enable indexes. That'd be "interesting" to implement for global
temp tables given that index state is part of the pg_index row associated
with an index rel though.

1. hold these data only in memory in special buffers

I don't see that working well for pg_statistic or anything else that holds
nontrivial user data though.

2. hold these data in global temporary tables - it is similar to normal

tables - we can use global temp tables for metadata like classic persistent
tables are used for metadata of classic persistent tables. Next syscache
can be enhanced to work with union of two system tables.

Very meta. Syscache and relcache are extremely performance critical but
could probably skip scans entirely on backends that haven't used any global
temp tables.

I don't know the relevant caches well enough to have a useful opinion here.

I think that it not possible to assume that temporary data will aways fir

in memory.
So 1) seems to be not acceptable solution.

It'd only be the metadata, but if it includes things like column histograms
and most frequent value data that'd still be undesirable to have pinned in
backend memory.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#15Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#14)
Re: Global temporary tables

On 13.08.2019 8:34, Craig Ringer wrote:

On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>> wrote:

But Postgres is not storing this information now anywhere else
except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp
table impl handling that right now, since you won't be changing the
pg_class row? AFAICS relpages doesn't need to be up to date (and
reltuples certainly doesn't) so presumably you're just leaving them as
zero?

As far as I understand relpages and reltuples are set only when you
perform "analyze" of the table.

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp
table? Is it just disallowed?

No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by
all backends.
So all backends will now build plans according to this statistic.
Certainly it may lead to not so efficient plans if there are large
differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for
temporary table. And it seems to me that users are created indexes for
temporary tables even rarely than doing analyze for them.

I'll need to check, but I wonder if periodically updating those fields
in pg_class impacts logical decoding too. Logical decoding must treat
transactions with catalog changes as special cases where it creates
custom snapshots and does other expensive additional work.
(See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its
callsites). We don't actually need to know relpages and reltuples
during logical decoding. It can probably ignore relfrozenxid
and relminmxid changes too, maybe even pg_statistic changes though I'd
be less confident about that one.

At some point I need to patch in a bunch of extra tracepoints and do
some perf tracing to see how often we do potentially unnecessary
snapshot related work in logical decoding.

Temporary tables (both local and global) as well as unlogged tables are
not subject of logical replication, aren't them?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#16Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#8)
Re: Global temporary tables

On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

Ok, here it is: global_private_temp-1.patch

Initial pass review follows.

Relation name "SESSION" is odd. I guess you're avoiding "global" because
the data is session-scoped, not globally temporary. But I'm not sure
"session" fits either - after all regular temp tables are also session
temporary tables. I won't focus on naming further beyond asking that it be
consistent though, right now there's a mix of "global" in some places and
"session" in others.

Why do you need to do all this indirection with changing RelFileNode to
RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly,
your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp .
Did you look into my suggestion of extending the relmapper so that global
temp tables would have a relfilenode of 0 like pg_class etc, and use a
backend-local map of oid-to-relfilenode mappings? I'm guessing you did it
the way you did instead to lay the groundwork for cross-backend sharing,
but if so it should IMO be in that patch. Maybe my understanding of the
existing temp table mechanics is just insufficient as I
see RelFileNodeBackendIsTemp is already used in some aspects of existing
temp relation handling.

Similarly, TruncateSessionRelations probably shouldn't need to exist in
this patch in its current form; there's no shared_buffers use to clean and
the same file cleanup mechanism should handle both session-temp and
local-temp relfilenodes.

A number of places make a change like this:

rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION

and I'd like to see a test macro or inline static for it since it's
repeated so much. Mostly to make the intent clear: "is this a relation with
temporary backend-scoped data?"

This test:

+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) &&
IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);

seems sensible but I'm wondering if there's a way to channel initialization
of global-temp objects through a bit more of a common path, so it reads
more obviously as a common test applying to all global-temp tables. "Global
temp table not initialized in session yet? Initialize it." So we don't have
to have every object type do an object type specific test for "am I
actually uninitialized?" in all paths it might hit. I guess I expected to
see something more like a

if (RelGlobalTempUninitialized(rel))

but maybe I've been doing too much Java ;)

A similar test reappears here for sequences:

+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION &&
PageIsNew(page))

Why is this written differently?

Sequence initialization ignores sequence startval/firstval settings. Why?

- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)

Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?

In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is
there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if
the oid is for a backend-temp table not a global-temp table?

+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence ==
RELPERSISTENCE_TEMP;

Won't session-temp tables have local buffers too? Until your next patch
that adds shared_buffers storage for them anyway?

+ * These is no need to separate them at file system level, so just
subtract SessionRelFirstBackendId
+ * to avoid too long file names.

I agree that there's no reason to be able to differentiate between
local-temp and session-temp relfilenodes at the filesystem level.

Also I have attached updated version of the global temp tables with shared
buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing
BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#17Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#15)
Re: Global temporary tables

On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

On 13.08.2019 8:34, Craig Ringer wrote:

On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

But Postgres is not storing this information now anywhere else except

statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table
impl handling that right now, since you won't be changing the pg_class row?
AFAICS relpages doesn't need to be up to date (and reltuples certainly
doesn't) so presumably you're just leaving them as zero?

As far as I understand relpages and reltuples are set only when you
perform "analyze" of the table.

Also autovacuum's autoanalyze.

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp

table? Is it just disallowed?

No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by
all backends.
So all backends will now build plans according to this statistic.
Certainly it may lead to not so efficient plans if there are large
differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for
temporary table. And it seems to me that users are created indexes for
temporary tables even rarely than doing analyze for them.

That doesn't seem too bad TBH. Hacky but it doesn't seem dangerously wrong
and as likely to be helpful as not if clearly documented.

Temporary tables (both local and global) as well as unlogged tables are
not subject of logical replication, aren't them?

Right. But in the same way that they're still present in the catalogs, I
think they still affect catalog snapshots maintained by logical decoding's
historic snapshot manager as temp table creation/drop will still AFAIK
cause catalog invalidations to be written on commit. I need to double check
that.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#18Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#16)
Re: Global temporary tables

On 13.08.2019 11:21, Craig Ringer wrote:

On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

Ok, here it is: global_private_temp-1.patch

Initial pass review follows.

Relation name "SESSION" is odd. I guess you're avoiding "global"
because the data is session-scoped, not globally temporary. But I'm
not sure "session" fits either - after all regular temp tables are
also session temporary tables. I won't focus on naming further beyond
asking that it be consistent though, right now there's a mix of
"global" in some places and "session" in others.

I have supported both forms "create session table" and "create global temp".
Both "session" and "global" are expected PostgreSQL keywords so we do
not need to introduce new one (unlike "public" or "shared").
The form "global temp" is used in Oracle so it seems to be natural to
provide similar syntax.

I am not insisting on this syntax and will consider all other suggestions.
But IMHO almost any SQL keyword is overloaded and have different meaning.
Temporary tables has session as living area (or transaction if created
with "ON COMMIT DROP" clause).
So calling them "session tables" is actually more clear than just
"temporary tables".
But "local temp tables" can be also considered as session tables. So may
be it is really not so good idea
to use "session" keyword for them.

Why do you need to do all this indirection with changing RelFileNode
to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc?
Similarly, your changes of RelFileNodeBackendIsTemp
to RelFileNodeBackendIsLocalTemp . Did you look into my suggestion of
extending the relmapper so that global temp tables would have a
relfilenode of 0 like pg_class etc, and use a backend-local map of
oid-to-relfilenode mappings? I'm guessing you did it the way you did
instead to lay the groundwork for cross-backend sharing, but if so it
should IMO be in that patch. Maybe my understanding of the existing
temp table mechanics is just insufficient as I
see RelFileNodeBackendIsTemp is already used in some aspects of
existing temp relation handling.

Sorry, are you really speaking about global_private_temp-1.patch?
This patch doesn't change bufmgr file at all.
May be you looked at another patch - global_shared_temp-1.patch
which is accessing shared tables though shared buffers and so have to
change buffer tag to include backend ID in it.

Similarly, TruncateSessionRelations probably shouldn't need to exist
in this patch in its current form; there's no shared_buffers use to
clean and the same file cleanup mechanism should handle both
session-temp and local-temp relfilenodes.

In global_private_temp-1.patch TruncateSessionRelations does nothing
with shared buffers, it just delete relation files.

A number of places make a change like this:
rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION

and I'd like to see a test macro or inline static for it since it's
repeated so much. Mostly to make the intent clear: "is this a relation
with temporary backend-scoped data?"

I consider to call such macro IsSessionRelation() or something like this
but you do not like notion "session".
Is IsBackendScopedRelation() a better choice?

This test:

+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) && 
IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);

seems sensible but I'm wondering if there's a way to channel
initialization of global-temp objects through a bit more of a common
path, so it reads more obviously as a common test applying to all
global-temp tables. "Global temp table not initialized in session yet?
Initialize it." So we don't have to have every object type do an
object type specific test for "am I actually uninitialized?" in all
paths it might hit. I guess I expected to see something more like a

if (RelGlobalTempUninitialized(rel))

but maybe I've been doing too much Java ;)

A similar test reappears here for sequences:

+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION &&
PageIsNew(page))

Why is this written differently?

Just because I wrote them in different moment of times:)
I think that adding RelGlobalTempUninitialized is really good idea.

Sequence initialization ignores sequence startval/firstval settings. Why?

I am handling only case of implicitly created sequences for
SERIAL/BIGSERIAL columns.
Is it possible to explicitly specify initial value and step for them?
If so, this place should definitely be rewritten.

- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)

Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?

RELPERSISTENCE_UNLOGGED case is handle in previous IF branch.

In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case,
is there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE
flag if the oid is for a backend-temp table not a global-temp table?

If it is local temp table, then XACT_FLAGS_ACCESSEDTEMPNAMESPACE is set
and  so there is no need to check this flag.
And as far as XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set now  for
global temp tables, I have to remove this check.
So for local temp table original behavior is preserved.

The question is why I do not set XACT_FLAGS_ACCESSEDTEMPNAMESPACE for
global temp tables?
I wanted to avoid current limitation for using temp tables in prepared
transactions.
Global metadata allows to eliminate some problems related with using
temp tables in 2PC.
But I am not sure that it eliminates ALL problems and there are no
strange effects related with
prepared transactions&global temp tables.

+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence ==
RELPERSISTENCE_TEMP;

Won't session-temp tables have local buffers too? Until your next
patch that adds shared_buffers storage for them anyway?

Once again, it is change from global_shared_temp-1.patch, not from
global_private_temp-1.patch

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#19Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#17)
Re: Global temporary tables

On 13.08.2019 11:27, Craig Ringer wrote:

On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

On 13.08.2019 8:34, Craig Ringer wrote:

On Tue, 13 Aug 2019 at 00:47, Pavel Stehule
<pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote:

But Postgres is not storing this information now anywhere
else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global
temp table impl handling that right now, since you won't be
changing the pg_class row? AFAICS relpages doesn't need to be up
to date (and reltuples certainly doesn't) so presumably you're
just leaving them as zero?

As far as I understand relpages and reltuples are set only when
you perform "analyze" of the table.

Also autovacuum's autoanalyze.

When it happen?
I have created normal table, populated it with some data and then wait
several hours but pg_class was not updated for this table.

I attach to this mail slightly refactored versions of this patches with
fixes of issues reported in your review.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

global_shared_temp-2.patchtext/x-patch; name=global_shared_temp-2.patchDownload+1095-121
global_private_temp-2.patchtext/x-patch; name=global_private_temp-2.patchDownload+1005-49
#20Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#19)
Re: Global temporary tables

On Tue, 13 Aug 2019 at 21:50, Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
wrote:

As far as I understand relpages and reltuples are set only when you

perform "analyze" of the table.

Also autovacuum's autoanalyze.

When it happen?
I have created normal table, populated it with some data and then wait
several hours but pg_class was not updated for this table.

heap_vacuum_rel() in src/backend/access/heap/vacuumlazy.c below

* Update statistics in pg_class.

which I'm pretty sure is common to explicit vacuum and autovacuum. I
haven't run up a test to verify 100% but most DBs would never have relpages
etc set if autovac didn't do it since most aren't explicitly VACUUMed at
all.

I thought it was done when autovac ran an analyze, but it looks like it's
all autovac. Try setting very aggressive autovac thresholds and inserting +
deleting a bunch of tuples maybe.

I attach to this mail slightly refactored versions of this patches with

fixes of issues reported in your review.

Thanks.

Did you have a chance to consider my questions too? I see a couple of
things where there's no patch change, which is fine, but I'd be interested
in your thoughts on the question/issue in those cases.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#21Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#20)
#22Craig Ringer
craig@2ndquadrant.com
In reply to: Konstantin Knizhnik (#21)
#23Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#22)
#24Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#23)
#25Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Craig Ringer (#22)
#26Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#25)
#27Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#26)
#28Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#27)
#29Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#28)
#30Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#29)
#31Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#30)
#32Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#31)
#33Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#32)
#34Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#33)
#35Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#34)
#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#35)
#37Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#36)
#38Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#37)
#39Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#38)
#40Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#39)
#41Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#40)
#42Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#40)
#43Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#42)
#44Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#43)
#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Konstantin Knizhnik (#41)
#46Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Alvaro Herrera (#45)
#47Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#46)
#48Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#47)
#49Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Konstantin Knizhnik (#49)
#51Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Michael Paquier (#50)
#52phb07
phb07@apra.asso.fr
In reply to: Konstantin Knizhnik (#51)
#53Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: phb07 (#52)
#54Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Tomas Vondra (#53)
#55Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#54)
#56Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#55)
#57Phil Florent
philflorent@hotmail.com
In reply to: Konstantin Knizhnik (#56)