Implementation of global temporary tables?

Started by Pavel Stehulealmost 11 years ago11 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

Six years ago we did discuss about global temporary tables - persistent
schema, ephemeral data.

http://postgresql.nabble.com/idea-global-temp-tables-td2007217.html

I am thinking so some reasons why implement this feature are valid:

* we can get some performance benefit against current temp tables - less
the catalogue bloating,

* we can simplify a static validation of plpgsql functions when temp tables
are used,
more global temp tables are little bit comfortable for developers,

* we can simplify migration from some other databases, where global temp
tables are default.

Some was changed from 2009:

* We have updatable CTE

* We have unlogged tables

Opened questions:

1. Name and conformance with standard of this feature - because we doesn't
support modules, a mapping ANSI -> PG should not be trivial

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

3.c - store ephemeral metadata only in memory without MVCC

Is there still interest about this feature?

Comments, notes?

Pavel

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#1)
Re: Implementation of global temporary tables?

2015-02-02 11:15 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hello

Six years ago we did discuss about global temporary tables - persistent
schema, ephemeral data.

http://postgresql.nabble.com/idea-global-temp-tables-td2007217.html

I am thinking so some reasons why implement this feature are valid:

* we can get some performance benefit against current temp tables - less
the catalogue bloating,

* we can simplify a static validation of plpgsql functions when temp
tables are used,
more global temp tables are little bit comfortable for developers,

* we can simplify migration from some other databases, where global temp
tables are default.

I forgot other possible benefit:

* using temp tables on slaves - (needs 3c implementation)

Show quoted text

Some was changed from 2009:

* We have updatable CTE

* We have unlogged tables

Opened questions:

1. Name and conformance with standard of this feature - because we doesn't
support modules, a mapping ANSI -> PG should not be trivial

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

3.c - store ephemeral metadata only in memory without MVCC

Is there still interest about this feature?

Comments, notes?

Pavel

#3Atri Sharma
atri.jiit@gmail.com
In reply to: Pavel Stehule (#1)
Re: Implementation of global temporary tables?

Some was changed from 2009:

* We have updatable CTE

* We have unlogged tables

Opened questions:

1. Name and conformance with standard of this feature - because we doesn't
support modules, a mapping ANSI -> PG should not be trivial

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

3.c - store ephemeral metadata only in memory without MVCC

With 2.a, essentially, we are defining a global definition of a temp table,
but the actual per session objects still follow the same rules are our
current temp tables do?

2.b seems like a lot of new data, and if we are defining new paths for e.g.
statistics for global temp tables, we might end up adding new logic in
planner to use those tables. I am not seeing how this will work.

Could you elaborate a bit on 3.c please?

Something that really bothers me here, on a different note, is the catalog
churn this could cause. Of course, you mentioned in as a point in your
email, but I feel that 2.a's showstopper could be the massive catalog churn
it causes. Maybe have a way to manage such tables without getting pg_class
to bloat pretty quickly (I am assuming you do not mean metadata as the
catalog metadata in 3.c).

Regards,

Atri

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Atri Sharma (#3)
Re: Implementation of global temporary tables?

2015-02-02 11:51 GMT+01:00 Atri Sharma <atri.jiit@gmail.com>:

Some was changed from 2009:

* We have updatable CTE

* We have unlogged tables

Opened questions:

1. Name and conformance with standard of this feature - because we
doesn't support modules, a mapping ANSI -> PG should not be trivial

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

3.c - store ephemeral metadata only in memory without MVCC

With 2.a, essentially, we are defining a global definition of a temp
table, but the actual per session objects still follow the same rules are
our current temp tables do?

yes .. it means "global temp table is template for local temp table"

2.b seems like a lot of new data, and if we are defining new paths for
e.g. statistics for global temp tables, we might end up adding new logic in
planner to use those tables. I am not seeing how this will work.

The advantages of this method is transactional behave and moving some
bloating content to specific smaller and unlogged tables.

Could you elaborate a bit on 3.c please?

Something that really bothers me here, on a different note, is the catalog
churn this could cause. Of course, you mentioned in as a point in your
email, but I feel that 2.a's showstopper could be the massive catalog churn
it causes. Maybe have a way to manage such tables without getting pg_class
to bloat pretty quickly (I am assuming you do not mean metadata as the
catalog metadata in 3.c).

3.c is good protection against catalog bloating - on second hand -
implementation will be probably more complex.

Show quoted text

Regards,

Atri

#5Andres Freund
andres@2ndquadrant.com
In reply to: Pavel Stehule (#1)
Re: Implementation of global temporary tables?

Hi,

On 2015-02-02 11:15:22 +0100, Pavel Stehule wrote:

Six years ago we did discuss about global temporary tables - persistent
schema, ephemeral data.

http://postgresql.nabble.com/idea-global-temp-tables-td2007217.html

I am thinking so some reasons why implement this feature are valid:

* we can get some performance benefit against current temp tables - less
the catalogue bloating,

* we can simplify a static validation of plpgsql functions when temp tables
are used,
more global temp tables are little bit comfortable for developers,

* we can simplify migration from some other databases, where global temp
tables are default.

I agree that the feature would be interesting.

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

Yea, that's no good.

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

I can't follow why that'd achieve anything?

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I think it's more realistic way to implement is to have a separate
'relpersistence' setting for global temp tables. The first access to
such one in a session (or xact if truncate on commit) copies the table
from the _init fork. By having the backend id in all filenames (besides
the init fork) they're unique between sessions.

Or something roughly like that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#5)
Re: Implementation of global temporary tables?

2015-02-02 12:04 GMT+01:00 Andres Freund <andres@2ndquadrant.com>:

Hi,

On 2015-02-02 11:15:22 +0100, Pavel Stehule wrote:

Six years ago we did discuss about global temporary tables - persistent
schema, ephemeral data.

http://postgresql.nabble.com/idea-global-temp-tables-td2007217.html

I am thinking so some reasons why implement this feature are valid:

* we can get some performance benefit against current temp tables - less
the catalogue bloating,

* we can simplify a static validation of plpgsql functions when temp

tables

are used,
more global temp tables are little bit comfortable for developers,

* we can simplify migration from some other databases, where global temp
tables are default.

I agree that the feature would be interesting.

2. Implementation

I see three possible ways how to implement it:

2.a - using on demand created temp tables - most simple solution, but
doesn't help with catalogue bloating

Yea, that's no good.

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

I can't follow why that'd achieve anything?

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps with
transaction support.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

I think it's more realistic way to implement is to have a separate
'relpersistence' setting for global temp tables. The first access to
such one in a session (or xact if truncate on commit) copies the table
from the _init fork. By having the backend id in all filenames (besides
the init fork) they're unique between sessions.

If I understand well, it is similar to my fast implementation from 2008. It
works partially, because it doesn't solve other (session) property - like
relpages, reltuples and related data from pg_statistics

Show quoted text

Or something roughly like that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Andres Freund
andres@2ndquadrant.com
In reply to: Pavel Stehule (#6)
Re: Implementation of global temporary tables?

On 2015-02-02 12:24:44 +0100, Pavel Stehule wrote:

2015-02-02 12:04 GMT+01:00 Andres Freund <andres@2ndquadrant.com>:

2.b - using unlogged tables for holding statistics, relfilenode, and all
necessary data

I can't follow why that'd achieve anything?

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps with
transaction support.

The amount of complexity that'd be involved to store catalog data in a
separate relation around the caches and accesses would be significant. I
don't think that's a realistic option.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

What does it have to do with temp tables then?

I think it's more realistic way to implement is to have a separate
'relpersistence' setting for global temp tables. The first access to
such one in a session (or xact if truncate on commit) copies the table
from the _init fork. By having the backend id in all filenames (besides
the init fork) they're unique between sessions.

If I understand well, it is similar to my fast implementation from 2008. It
works partially, because it doesn't solve other (session) property - like
relpages, reltuples and related data from pg_statistics

I'm honestly not particularly concerned about that problem. For one, we
don't auto-analyze/vacuum temp tables. For another, it'd be
comparatively easy to gather reltuples/relpages/stats from session local
state if necessary. Those are all only accessed from a few places.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Atri Sharma
atri.jiit@gmail.com
In reply to: Andres Freund (#7)
Re: Implementation of global temporary tables?

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps with
transaction support.

The amount of complexity that'd be involved to store catalog data in a
separate relation around the caches and accesses would be significant. I
don't think that's a realistic option.

Not to mention the problems we might end up in. We still have corner cases
in our cache code, and a new heap on top of it all might be just too
painful.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

What does it have to do with temp tables then?

I think what Pavel means here is that we do not need a full fledged heap
layer and rather only a minimal API from a per session memory context.
However, that might be still as painful because we will eventually end up
inventing mechanisms for syscache and typcache to work with this storage,
which IMO is the biggest pain point around this idea.

Regards,

Atri

Regards,

Atri
*l'apprenant*

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#7)
Re: Implementation of global temporary tables?

2015-02-02 13:15 GMT+01:00 Andres Freund <andres@2ndquadrant.com>:

On 2015-02-02 12:24:44 +0100, Pavel Stehule wrote:

2015-02-02 12:04 GMT+01:00 Andres Freund <andres@2ndquadrant.com>:

2.b - using unlogged tables for holding statistics, relfilenode, and

all

necessary data

I can't follow why that'd achieve anything?

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps with
transaction support.

The amount of complexity that'd be involved to store catalog data in a
separate relation around the caches and accesses would be significant. I
don't think that's a realistic option.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

What does it have to do with temp tables then?

it is mechanism how to store a session metadata related to global temp
tables

I think it's more realistic way to implement is to have a separate
'relpersistence' setting for global temp tables. The first access to
such one in a session (or xact if truncate on commit) copies the table
from the _init fork. By having the backend id in all filenames (besides
the init fork) they're unique between sessions.

If I understand well, it is similar to my fast implementation from 2008.

It

works partially, because it doesn't solve other (session) property -

like

relpages, reltuples and related data from pg_statistics

I'm honestly not particularly concerned about that problem. For one, we
don't auto-analyze/vacuum temp tables. For another, it'd be
comparatively easy to gather reltuples/relpages/stats from session local
state if necessary. Those are all only accessed from a few places.

so I don't see a big differences from 3.c - all session metadata will
stored in session memory.

I didn't write code, so I have not a knowledge about details.

Show quoted text

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Atri Sharma (#8)
Re: Implementation of global temporary tables?

2015-02-02 13:36 GMT+01:00 Atri Sharma <atri.jiit@gmail.com>:

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps with
transaction support.

The amount of complexity that'd be involved to store catalog data in a
separate relation around the caches and accesses would be significant. I
don't think that's a realistic option.

Not to mention the problems we might end up in. We still have corner cases
in our cache code, and a new heap on top of it all might be just too
painful.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

What does it have to do with temp tables then?

I think what Pavel means here is that we do not need a full fledged heap
layer and rather only a minimal API from a per session memory context.
However, that might be still as painful because we will eventually end up
inventing mechanisms for syscache and typcache to work with this storage,
which IMO is the biggest pain point around this idea.

It should be solvable - I see another risk - if we accelerate a work with
temp tables, then 4 byte oid should not be enough.

Show quoted text

Regards,

Atri

Regards,

Atri
*l'apprenant*

#11Atri Sharma
atri.jiit@gmail.com
In reply to: Pavel Stehule (#10)
Re: Implementation of global temporary tables?

On Mon, Feb 2, 2015 at 6:34 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

2015-02-02 13:36 GMT+01:00 Atri Sharma <atri.jiit@gmail.com>:

1. Main catalogue will be stable.
2. There is not necessary to implement new storage and it can helps

with

transaction support.

The amount of complexity that'd be involved to store catalog data in a
separate relation around the caches and accesses would be significant. I
don't think that's a realistic option.

Not to mention the problems we might end up in. We still have corner
cases in our cache code, and a new heap on top of it all might be just too
painful.

3.c - store ephemeral metadata only in memory without MVCC

I think that's not an option. That'd end up being a massive amount of
duplication at a low rate of functionality.

I don't plan to implement a storage - I expect only few functions for
store/read data from session memory context

What does it have to do with temp tables then?

I think what Pavel means here is that we do not need a full fledged heap
layer and rather only a minimal API from a per session memory context.
However, that might be still as painful because we will eventually end up
inventing mechanisms for syscache and typcache to work with this storage,
which IMO is the biggest pain point around this idea.

It should be solvable - I see another risk - if we accelerate a work with
temp tables, then 4 byte oid should not be enough.

Hrm, that might well be true. It might be worth the effort to find a better
way to materialize global temp tables then, like having a single OID and
only materializing a relfilenode for a session when the session inserts
into the temp table. Not sure here at all...

--
Regards,

Atri
*l'apprenant*