[Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Started by Aleksander Alekseevover 9 years ago63 messages
Jump to latest
#1Aleksander Alekseev
aleksander@timescale.com

Hello

Some time ago we discussed an idea of "fast temporary tables":

/messages/by-id/20160301182500.2c81c3dc@fujitsu

In two words the idea is following.

<The Idea>

PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.

</The Idea>

I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!

--
Best regards,
Aleksander Alekseev

Attachments:

fast-temporary-tables-v1.patchtext/x-patchDownload+2750-121
#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Aleksander Alekseev (#1)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Hi,

On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:

Hello

Some time ago we discussed an idea of "fast temporary tables":

/messages/by-id/20160301182500.2c81c3dc@fujitsu

In two words the idea is following.

<The Idea>

PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.

</The Idea>

Great! Thanks for the patch, this is definitely an annoying issue worth
fixing. I've spent a bit of time looking at the patch today, comments
below ...

I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.

Well, jokes aside, that's a pretty lousy excuse for not writing any
docs, and you're pretty much asking the reviewers to reverse-engineer
your reasoning. So I doubt you'll get many serious reviews without
fixing this gap.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!

1) I wonder whether the FAST makes sense - does this really change the
performance significantly? IMHO you only move the catalog rows to
memory, so why should the tables be any faster? I also believe this
conflicts with SQL standard specification of CREATE TABLE.

2) Why do we need the new relpersistence value? ISTM we could easily got
with just RELPERSISTENCE_TEMP, which would got right away of many
chances as the steps are exactly the same.

IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in pg_class
has advantages?). That'd also eliminate the need for FAST keyword in the
CREATE TABLE command.

The one thin I'm not sure about is that our handling of temporary tables
is not standard compliant - we require each session to create it's own
private temporary table. Moving the rows from pg_class into backend
memory seems to go in the opposite direction, but as no one was planning
to fix this, I don't think it matters much.

3) I think the heapam/indexam/xact and various other places needs a
major rework. You've mostly randomly sprinkled the code with function
calls to make the patch work - that's fine for an initial version, but a
more principled approach is needed.

4) I'm getting failures in the regression suite - apparently the patch
somehow affects costing of index only scans, so that a several queries
switch from index only scans to bitmap index scans etc. I haven't
investigated this more closely, but it seems quite consistent (and I
don't see it without the patch). It seems the patch delays building of
visibility map, or something like that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#2)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-07-30 1:46 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

Hi,

On 07/29/2016 01:15 PM, Aleksander Alekseev wrote:

Hello

Some time ago we discussed an idea of "fast temporary tables":

/messages/by-id/20160301182500.2c81c3dc@fujitsu

In two words the idea is following.

<The Idea>

PostgreSQL stores information about all relations in pg_catalog. Some
applications create and delete a lot of temporary tables. It causes a
bloating of pg_catalog and running auto vacuum on it. It's quite an
expensive operation which affects entire database performance.

We could introduce a new type of temporary tables. Information about
these tables is stored not in a catalog but in backend's memory. This
way user can solve a pg_catalog bloating problem and improve overall
database performance.

</The Idea>

Great! Thanks for the patch, this is definitely an annoying issue worth
fixing. I've spent a bit of time looking at the patch today, comments below
...

Yes, it some what we need long time

I took me a few months but eventually I made it work. Attached patch
has some flaws. I decided not to invest a lot of time in documenting
it or pgindent'ing all files yet. In my experience it will be rewritten
entirely 3 or 4 times before merging anyway :) But it _works_ and
passes all tests I could think of, including non-trivial cases like
index-only or bitmap scans of catalog tables.

Well, jokes aside, that's a pretty lousy excuse for not writing any docs,
and you're pretty much asking the reviewers to reverse-engineer your
reasoning. So I doubt you'll get many serious reviews without fixing this
gap.

Usage example:

```
CREATE FAST TEMP TABLE fasttab_test1(x int, s text);

INSERT INTO fasttab_test1 VALUES (1, 'aaa'), (2, 'bbb'), (3, 'ccc');

UPDATE fasttab_test1 SET s = 'ddd' WHERE x = 2;

DELETE FROM fasttab_test1 WHERE x = 3;

SELECT * FROM fasttab_test1 ORDER BY x;

DROP TABLE fasttab_test1;
```

More sophisticated examples could be find in regression tests:

./src/test/regress/sql/fast_temp.sql

Any feedback on this patch will be much appreciated!

1) I wonder whether the FAST makes sense - does this really change the
performance significantly? IMHO you only move the catalog rows to memory,
so why should the tables be any faster? I also believe this conflicts with
SQL standard specification of CREATE TABLE.

Probably has zero value to have slow and fast temp tables (from catalogue
cost perspective). So the FAST implementation should be used everywhere.
But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
CREATE TEMP TABLE xxx()
ELSE
TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
CREATE TEMP TABLE(...)
END;

In this case we can use GUC, because visible behave should be same.

The benefit of zero catalogue cost temp tables is significant - and for
some larger applications the temp tables did hard performance issues.

2) Why do we need the new relpersistence value? ISTM we could easily got
with just RELPERSISTENCE_TEMP, which would got right away of many chances
as the steps are exactly the same.

IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in pg_class
has advantages?). That'd also eliminate the need for FAST keyword in the
CREATE TABLE command.

The one thin I'm not sure about is that our handling of temporary tables
is not standard compliant - we require each session to create it's own
private temporary table. Moving the rows from pg_class into backend memory
seems to go in the opposite direction, but as no one was planning to fix
this, I don't think it matters much.

3) I think the heapam/indexam/xact and various other places needs a major
rework. You've mostly randomly sprinkled the code with function calls to
make the patch work - that's fine for an initial version, but a more
principled approach is needed.

4) I'm getting failures in the regression suite - apparently the patch
somehow affects costing of index only scans, so that a several queries
switch from index only scans to bitmap index scans etc. I haven't
investigated this more closely, but it seems quite consistent (and I don't
see it without the patch). It seems the patch delays building of visibility
map, or something like that.

Some other random notes:

1. With this code should not be hard to implement global temp tables -
shared persistent structure, temp local data - significant help for any who
have to migrate from Oracle.

2. This should to work on slaves - it is one of ToDo

3. I didn't see support for memory store for column's statistics. Some
separate questions is about production statistics - pg_stat_user_table, ..

Great and important work, thank you

Pavel

Show quoted text

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Pavel Stehule (#3)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 07/30/2016 06:49 AM, Pavel Stehule wrote:

1) I wonder whether the FAST makes sense - does this really change
the performance significantly? IMHO you only move the catalog rows
to memory, so why should the tables be any faster? I also believe
this conflicts with SQL standard specification of CREATE TABLE.

Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere. But there are some patterns used with work with temp
tables,that should not working, and we would to decide if we prepare
workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
CREATE TEMP TABLE xxx()
ELSE
TRUNCATE TABLE xxx;
END IF;

I'd argue that if you mess with catalogs directly, you're on your own.
Not only it's fragile, but this pattern is also prone to race conditions
(although a concurrent session can't create a conflicting temporary table).

-- modern patter (new code)
BEGIN
TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
CREATE TEMP TABLE(...)
END;

In this case we can use GUC, because visible behave should be same.

What GUC?

The benefit of zero catalogue cost temp tables is significant - and for
some larger applications the temp tables did hard performance issues.

Yeah, catalog bloat is a serious issue in such cases, and it's amplified
by indexes created on the temporary tables.

Some other random notes:

1. With this code should not be hard to implement global temp tables -
shared persistent structure, temp local data - significant help for any
who have to migrate from Oracle.

The patch moves in pretty much the opposite direction - if anything,
it'll make it more difficult to implement global temporary tables,
because it removes the definitions from the catalog, thus impossible to
share by catalogs. To get global temporary tables, I think the best
approach would be to share the catalog definition and only override the
filename. Or something like that.

2. This should to work on slaves - it is one of ToDo

No, it does not work on slaves, because it still does a read-write
transaction.

test=# begin read only;
BEGIN
test=# create fast temporary table x (id int);
ERROR: cannot execute CREATE TABLE in a read-only transaction

No idea how difficult it'd be to make it work.

3. I didn't see support for memory store for column's statistics. Some
separate questions is about production statistics - pg_stat_user_table, ..

That seems to work (both analyze and pg_stat_user_tables). Not sure
where it's in the code, and I'm not willing to reverse engineer it.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#3)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Pavel Stehule <pavel.stehule@gmail.com> writes:

But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
CREATE TEMP TABLE xxx()
ELSE
TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
CREATE TEMP TABLE(...)
END;

If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough. The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one. We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".

In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.
Or maybe to partition the catalogs so that vacuuming away temp-table
rows is easier/cheaper than today.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Steele
david@pgmasters.net
In reply to: Tom Lane (#5)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 7/30/16 10:47 AM, Tom Lane wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
CREATE TEMP TABLE xxx()
ELSE
TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
CREATE TEMP TABLE(...)
END;

If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough. The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one. We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".

In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.
Or maybe to partition the catalogs so that vacuuming away temp-table
rows is easier/cheaper than today.

In addition the latter pattern burns an xid which can be a problem for
high-volume databases.

How about CREATE TEMP TABLE IF NOT EXISTS...?

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 07/30/2016 04:47 PM, Tom Lane wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

But there are some patterns used with work with temp tables,that should not
working, and we would to decide if we prepare workaround or not.

-- problematic pattern (old code)
IF NOT EXISTS(SELECT * FROM pg_class WHERE ....) THEN
CREATE TEMP TABLE xxx()
ELSE
TRUNCATE TABLE xxx;
END IF;

-- modern patter (new code)
BEGIN
TRUNCATE TABLE xxx;
EXCEPTION WHEN ..... THEN
CREATE TEMP TABLE(...)
END;

If the former stops working, that's a sufficient reason to reject the
patch: it hasn't been thought through carefully enough. The key reason
why I don't think that's negotiable is that if there aren't (apparently)
catalog entries corresponding to the temp tables, that will almost
certainly break many things in the backend and third-party extensions,
not only user code patterns like this one. We'd constantly be fielding
bug reports that "feature X doesn't work with temp tables anymore".

Agreed - breaking internal features for temporary tables is not
acceptable. I was thinking more about external code messing with
catalogs, but on second thought we probably need to keep the records in
pg_class anyway.

In short, I think that the way to make something like this work is
to figure out how to have "virtual" catalog rows describing a temp
table. Or maybe to partition the catalogs so that vacuuming away
temp-table rows is easier/cheaper than today.

Yeah, and I think the patch tries to do that, although in a rather
invasive / unprincipled way. But this will only work for the current
behavior (i.e. mostly what SQL standard means by LOCAL). For GLOBAL
temporary tables I think we need to keep physical catalog row, and only
override the storage filename.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Aleksander Alekseev
aleksander@timescale.com
In reply to: Tomas Vondra (#7)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Hello.

Thanks everyone for great comments!

Well, jokes aside, that's a pretty lousy excuse for not writing any
docs

I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.

IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in
pg_class has advantages?). That'd also eliminate the need for FAST
keyword in the CREATE TABLE command.

Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere.

If there are no objections I see no reason no to do it in a next
version of a patch.

I'm getting failures in the regression suite

I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?

This should to work on slaves - it is one of ToDo

Glad you noticed! In fact I'm currently researching a possibility of
using the same approach for creating writable temporary tables on
replicas.

The key reason why I don't think that's negotiable is that if there
aren't (apparently) catalog entries corresponding to the temp tables,
that will almost certainly break many things in the backend and
third-party extensions, not only user code patterns like this one.

In short, I think that the way to make something like this work is to
figure out how to have "virtual" catalog rows describing a temp table.

I'm afraid once again I put it in a wrong way. What I meant by
"Information about these tables is stored not in a catalog but in
backend's memory" is in fact that _records_ of pg_class, pg_type and
other catalog relations are stored in-memory. Naturally this records
are visible to the user (otherwise \d or \d+ would not work) and you
can do queries like ` select * from pg_class where relname = 'tt1' `.
In other words part of the catalog is indeed "virtual".

I didn't see support for memory store for column's statistics. Some
separate questions is about production statistics -
pg_stat_user_table, ..

That seems to work (both analyze and pg_stat_user_tables). Not sure
where it's in the code, and I'm not willing to reverse engineer it.

Right, `ANALYZE temp_table;` and everything else works. Besides
pg_class, pg_type, pg_attribute and other relations pg_statistic
records for temp tables are stored in-memory as well. IIRC a lot of
pg_stat* relations are in fact views and thus don't require any special
support. If you see that some statistics are broken please don't
hesitate to report it and I will fix it.

Hope I answered all questions so far. I look forward to receive more
comments and questions regarding this patch!

--
Best regards,
Aleksander Alekseev

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Aleksander Alekseev (#8)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 08/01/2016 11:45 AM, Aleksander Alekseev wrote:

Hello.

Thanks everyone for great comments!

Well, jokes aside, that's a pretty lousy excuse for not writing any
docs

I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.

I don't think there's "a lot of comments in the code", not even
remotely. At least not in the files I looked into - heapam, indexam,
xact etc. There are a few comments in general, and most of them only
comment obvious facts, like "ignore in-memory tuples" right before a
trivial if statement.

What is needed is an overview of the approach, so that the reviewers can
read that first, instead of assembling the knowledge from pieces
scattered over comments in many pieces. But I see the fasttab.c contains
this:

/* TODO TODO comment the general idea - in-memory tuples and indexes,
hooks principle, FasttabSnapshots, etc */

The other thing that needs to happen is you need to modify comments in
front of some of the modified methods - e.g. the comments may need a
paragraph "But when the table is fast temporary, what happens is ..."

IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in
pg_class has advantages?). That'd also eliminate the need for FAST
keyword in the CREATE TABLE command.

Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere.

If there are no objections I see no reason no to do it in a next
version of a patch.

I believe there will be a lot of discussion about this.

I'm getting failures in the regression suite

I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?

Nothing special:

$ ./configure --prefix=/home/user/pg-temporary --enable-debug \
--enable-cassert

$ make -s clean && make -s -j4 install

$ export PATH=/home/user/pg-temporary/bin:$PATH

$ pg_ctl -D ~/tmp/data-temporary init

$ pg_ctl -D ~/tmp/data-temporary -l ~/temporary.log start

$ make installcheck

I get the failures every time - regression diff attached. The first
failure in "rolenames" is expected, because of clash with existing user
name. The remaining two failures are not.

I only get the failure for "installcheck" but not "check" for some reason.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

regression.diffstext/plain; charset=UTF-8; name=regression.diffsDownload+2-0
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#9)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

What is needed is an overview of the approach, so that the reviewers can
read that first, instead of assembling the knowledge from pieces
scattered over comments in many pieces. But I see the fasttab.c contains
this:

/* TODO TODO comment the general idea - in-memory tuples and indexes,
hooks principle, FasttabSnapshots, etc */

A fairly common answer when some feature needs an implementation overview
is to create a README file for it, or add a new section in an existing
README file.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#10)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Thanks everyone for your remarks and comments!

Here is an improved version of a patch.

Main changes:
* Patch passes `make installcheck`
* Code is fully commented, also no more TODO's

I wish I sent this version of a patch last time. Now I realize it was
really hard to read and understand. Hope I managed to correct this
flaw. If you believe that some parts of the code are still poorly
commented or could be improved in any other way please let me know.

And as usual, any other comments, remarks or questions are highly
appreciated!

--
Best regards,
Aleksander Alekseev

Attachments:

fast-temporary-tables-v2.patchtext/x-patchDownload+3054-127
#12Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Tomas Vondra (#4)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 30 July 2016 at 13:42, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I'd argue that if you mess with catalogs directly, you're on your own.

Interesting. What would you suggest people use instead?

Geoff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#11)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

Thanks everyone for your remarks and comments!

Here is an improved version of a patch.

Main changes:
* Patch passes `make installcheck`
* Code is fully commented, also no more TODO's

I wish I sent this version of a patch last time. Now I realize it was
really hard to read and understand. Hope I managed to correct this
flaw. If you believe that some parts of the code are still poorly
commented or could be improved in any other way please let me know.

And as usual, any other comments, remarks or questions are highly
appreciated!

Three general comments:

1. It's always seemed to me that a huge problem with anything of this
sort is dependencies. For example, suppose I create a fast temporary
table and then I create a functional index on the fast temporary table
that uses some SQL function defined in pg_proc. Then, another user
drops the function. Then, I try to use the index. With regular
temporary tables, dependencies protect us here, but if there are no
catalog entries, I wonder how this can ever be made safe. Similar
problems exist for triggers, constraints, RLS policies, and attribute
defaults.

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc. I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer. That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here. Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again. I don't think this code is reasonably maintainable
in this form.

3. Currently, temporary tables are parallel-restricted: a query that
references temporary tables can use parallelism, but access to the
temporary tables is only possible from within the leader. I suspect,
although I'm not entirely sure, that lifting this restriction would be
easier with our current temporary table implementation than with this
one, because the current temporary table implementation mostly relies
on a set of buffers that could be moved from backend-private memory to
DSM. On a quick look, this implementation uses a bunch of new data
structures that are heavy on pointers, so that gets quite a bit more
complicated to make parallel-safe (unless we adopt threads instead of
processes!).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#13)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

​Hi

I tried to dig into this patch (which seems pretty interesting) to help
bring
it in good shape. Here are few random notes, I hope they can be helpful:

I think we actually need a real API here.

Definitely, there are plenty places in the new code with the same pattern:
* figure out if it's an action related to the fast temporary tables based
on
`ItemPointer`/relation OID/etc
* if it is, add some extra logic or skip something in original
implementation

in `heapam.c`, `indexam.c`, `xact.c`, `dependency.c`. I believe it's
possible to
make it more generic (although it will contain almost the same logic), e.g.
separate regular and fasttable implementations completely, and decide which
one
we should choose in that particular case.

Btw, I'm wondering about the `heap_*` functions in `heapam.c` - some of
them are
wrapped and never used directly, although they're contains in
`INTERFACE_ROUTINES` (like `simple_heap_delete` -> `heap_delete`), some of
them
aren't. It looks like inconsistency in terms of function names, probably it
should be unified?

What is needed is an overview of the approach, so that the reviewers can

read

that first,

I feel lack of such information even in new version of this patch (but, I'm
probably a noob in these matters). I noted that the `fasttab.c` file
contains some
general overview, but in terms of "what are we doing", and "why are we doing
this". I think general overview of "how are we doing this" also may be
useful.
And there are several slightly obvious commentaries like:

```
+ /* Is it a virtual TID? */
+ if (IsFasttabItemPointer(tid))
```

but I believe an introduction of a new API (see the previous note) will
solve
this eventually.

Why do we need the new relpersistence value? ISTM we could easily got with
just RELPERSISTENCE_TEMP, which would got right away of many chances as

the

steps are exactly the same.

I agree, it looks like `RELPERSISTENCE_FAST_TEMP` hasn't any influence on
the
code.

For example, suppose I create a fast temporary table and then I create a
functional index on the fast temporary table that uses some SQL function
defined in pg_proc.

Just to clarify, did you mean something like this?
```
create fast temp table fasttab(x int, s text);
create or replace function test_function_for_index(t text) returns text as
$$
begin
return lower(t);
end;
$$ language plpgsql immutable;
create index fasttab_s_idx on fasttab (test_function_for_index(s));
drop function test_function_for_index(t text);
```
As far as I understand dependencies should protect in case of fasttable too,
because everything is visible as in regular case, isn't it?

And finally one more question, why items of `FasttabIndexMethodsTable[]`
like
this one:
```
+ /* 2187, non-unique */
+ {InheritsParentIndexId, 1,
+ {Anum_pg_inherits_inhparent, 0, 0},
+ {CompareOid, CompareInvalid, CompareInvalid}
+ },
```
have this commentary before them? I assume it's an id and an extra
information,
and I'm concerned that they can easily become outdated inside commentary
block.
#15Robert Haas
robertmhaas@gmail.com
In reply to: Dmitry Dolgov (#14)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On Sat, Aug 6, 2016 at 4:05 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

For example, suppose I create a fast temporary table and then I create a
functional index on the fast temporary table that uses some SQL function
defined in pg_proc.

Just to clarify, did you mean something like this?
```
create fast temp table fasttab(x int, s text);
create or replace function test_function_for_index(t text) returns text as
$$
begin
return lower(t);
end;
$$ language plpgsql immutable;
create index fasttab_s_idx on fasttab (test_function_for_index(s));
drop function test_function_for_index(t text);
```
As far as I understand dependencies should protect in case of fasttable too,
because everything is visible as in regular case, isn't it?

I think the whole idea of a fast temporary table is that there are no
catalog entries. If there are no catalog entries, then dependencies
are not visible. If there ARE catalog entries, to what do they refer?
Without a pg_class entry for the table, there's no table OID upon
which to depend.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Robert Haas <robertmhaas@gmail.com> writes:

I think the whole idea of a fast temporary table is that there are no
catalog entries. If there are no catalog entries, then dependencies
are not visible. If there ARE catalog entries, to what do they refer?
Without a pg_class entry for the table, there's no table OID upon
which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero. Tables have to have OIDs; there is just
too much code that assumes that. And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Robert Haas (#13)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

05.08.2016 19:41, Robert Haas:

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc. I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer. That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here. Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again. I don't think this code is reasonably maintainable
in this form.

As I can see, you recommend to clean up the API of storage
management code. I strongly agree that it's high time to do it.

So, I started the discussion about refactoring and improving API
of heapam and heap relations.
You can find it on commitfest:
https://commitfest.postgresql.org/10/700/

I'll be glad to see your thoughts on the thread.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 2016-08-07 14:46:06 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think the whole idea of a fast temporary table is that there are no
catalog entries. If there are no catalog entries, then dependencies
are not visible. If there ARE catalog entries, to what do they refer?
Without a pg_class entry for the table, there's no table OID upon
which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero. Tables have to have OIDs; there is just
too much code that assumes that. And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

That seems a bit too defeatist. It's obviously not a small change to get
there - and I don't think the patch upthread is really attacking the
relevant problems yet - but saying that we'll never have temp tables
without pg_class/pg_depend bloat seems to be pretty close to just giving
up. Having 8 byte oids (as explicit columns instead of magic? Or just
oid64?) and then reserving ranges for temp objects stored in a local
memory seems to be feasible. The pinning problem could potentially be
solved by "session lifetime" pins in pg_depend, which prevents dependent
objects being dropped. Obviously that's just spitballing; but I think
the problem is too big to just give up.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

Andres Freund <andres@anarazel.de> writes:

On 2016-08-07 14:46:06 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think the whole idea of a fast temporary table is that there are no
catalog entries. If there are no catalog entries, then dependencies
are not visible. If there ARE catalog entries, to what do they refer?
Without a pg_class entry for the table, there's no table OID upon
which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero. Tables have to have OIDs; there is just
too much code that assumes that. And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

That seems a bit too defeatist.

Huh? I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good. Which you seemed to agree with.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

On 2016-08-14 21:04:57 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-08-07 14:46:06 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think the whole idea of a fast temporary table is that there are no
catalog entries. If there are no catalog entries, then dependencies
are not visible. If there ARE catalog entries, to what do they refer?
Without a pg_class entry for the table, there's no table OID upon
which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero. Tables have to have OIDs; there is just
too much code that assumes that. And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

That seems a bit too defeatist.

Huh? I didn't say we shouldn't work on the problem --- I just think that
this particular approach isn't good. Which you seemed to agree with.

I took your statement to mean that they need a pg_class entry - even if
there were a partial solution to the pg_depend problem allowing to avoid
pg_attribute entries, tha't still not really be a solution. If that's
not what you mean, sorry - and nice that we agree ;)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Aleksander Alekseev
aleksander@timescale.com
In reply to: Andres Freund (#20)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Aleksander Alekseev (#8)
#23Aleksander Alekseev
aleksander@timescale.com
In reply to: Pavel Stehule (#22)
#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Aleksander Alekseev (#23)
#25Aleksander Alekseev
aleksander@timescale.com
In reply to: Pavel Stehule (#24)
#26Pavel Stehule
pavel.stehule@gmail.com
In reply to: Aleksander Alekseev (#25)
#27Christoph Berg
myon@debian.org
In reply to: Tom Lane (#5)
#28Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#8)
#30Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#21)
#32Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#31)
#33Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#31)
#34Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Aleksander Alekseev (#33)
#35Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Pavel Stehule (#26)
#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#32)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#37)
#39Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#38)
#40Aleksander Alekseev
aleksander@timescale.com
In reply to: Andres Freund (#39)
#41Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#37)
#42Claudio Freire
klaussfreire@gmail.com
In reply to: Tomas Vondra (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Claudio Freire (#42)
#44Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Claudio Freire (#42)
#45Claudio Freire
klaussfreire@gmail.com
In reply to: Andres Freund (#43)
#46Andres Freund
andres@anarazel.de
In reply to: Claudio Freire (#45)
#47Claudio Freire
klaussfreire@gmail.com
In reply to: Tomas Vondra (#44)
#48Bruce Momjian
bruce@momjian.us
In reply to: Aleksander Alekseev (#40)
#49Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Claudio Freire (#47)
#50Claudio Freire
klaussfreire@gmail.com
In reply to: Bruce Momjian (#48)
#51Claudio Freire
klaussfreire@gmail.com
In reply to: Tomas Vondra (#49)
#52Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Claudio Freire (#51)
#53Claudio Freire
klaussfreire@gmail.com
In reply to: Alvaro Herrera (#52)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#41)
#55Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#43)
#56Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#55)
#57Corey Huinker
corey.huinker@gmail.com
In reply to: Andres Freund (#56)
#58Vik Fearing
vik@postgresfriends.org
In reply to: Robert Haas (#54)
#59Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Vik Fearing (#58)
#60Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#59)
#61Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#60)
#62Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#61)
#63Corey Huinker
corey.huinker@gmail.com
In reply to: Andres Freund (#62)