statistics about tamp tables ...

Started by Hans-Jürgen Schönigabout 22 years ago5 messages
#1Hans-Jürgen Schönig
postgres@cybertec.at

Recently I have come across a simple issue which made me think about it.
When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner
won't know anything about its content after creating it.
Many people use temp tables heavy when the amount of data for a certain
analysis has to be reduced significantly. Frequently the same tmp table
is queried quite frequently. In order to speed those scenarios up it can
be useful to vacuum those tmp tables so that the planner will find more
clever joins.
Is it possible and does it make sense to generate those statistics on
the fly (during CREATE TABLE AS)? Maybe we could have a GUC which tells
the system whether to generate statistics or not.

test=# select * from test;
id
----
4
4
(2 rows)

test=# VACUUM test ;
VACUUM

test=# explain select * from test ;
QUERY PLAN
----------------------------------------------------
Seq Scan on test (cost=0.00..1.02 rows=2 width=4)
(1 row)

test=# select * into tmp from test;
SELECT
test=# explain select * from tmp;
QUERY PLAN
-------------------------------------------------------
Seq Scan on tmp (cost=0.00..20.00 rows=1000 width=4)
(1 row)

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hans-Jürgen Schönig (#1)
Re: statistics about tamp tables ...

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

Recently I have come across a simple issue which made me think about it.
When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner
won't know anything about its content after creating it.

Run ANALYZE on the temp table, if you intend to use it enough to justify
gathering stats about it. VACUUM is more work than needed.

regards, tom lane

In reply to: Tom Lane (#2)
Re: statistics about tamp tables ...

Tom Lane wrote:

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

Recently I have come across a simple issue which made me think about it.
When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner
won't know anything about its content after creating it.

Run ANALYZE on the temp table, if you intend to use it enough to justify
gathering stats about it. VACUUM is more work than needed.

regards, tom lane

Of course, VACUUM is on overkill (there is no use to shrink something
minimal ;) ).
The reason why I came up with this posting is slightly different: Assume
a JDBC application which works with PostgreSQL + some other database. If
you want to use both databases without PostgreSQL being unnecessarily
slow an implicit mechanism would be better. Because otherwise you will
have an SQL command in there which is off standard - putting a switch
into the application seems to be a fairly ugly solution.

regards,

Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#4Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Hans-Jürgen Schönig (#3)
Re: statistics about tamp tables ...

On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-J�rgen Sch�nig wrote:

The reason why I came up with this posting is slightly different: Assume
a JDBC application which works with PostgreSQL + some other database. If
you want to use both databases without PostgreSQL being unnecessarily
slow an implicit mechanism would be better. Because otherwise you will
have an SQL command in there which is off standard - putting a switch
into the application seems to be a fairly ugly solution.

That's why you delegate the job to something else, like pg_autovacuum or
cron ...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo l�gico y coherente. Pero el universo real se halla siempre
un paso m�s all� de la l�gica" (Irulan)

In reply to: Alvaro Herrera (#4)
Re: statistics about tamp tables ...

Alvaro Herrera wrote:

On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-J�rgen Sch�nig wrote:

The reason why I came up with this posting is slightly different: Assume
a JDBC application which works with PostgreSQL + some other database. If
you want to use both databases without PostgreSQL being unnecessarily
slow an implicit mechanism would be better. Because otherwise you will
have an SQL command in there which is off standard - putting a switch
into the application seems to be a fairly ugly solution.

That's why you delegate the job to something else, like pg_autovacuum or
cron ...

If you are in the middle of a data mining application using a tmp table
you don't want to wait for cron ;). You might want the statistics to be
correct as soon as the table has been created.

Regards,
Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at