Cluster table and order information

Started by Dario Beraldiabout 15 years ago4 messagesgeneral

dario.beraldi@ed.ac.uk

about 15 years ago

Hello,

From the documentation of CLUSTER table
(http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I
understand that clustering can be achieved by re-creating the table
like this:

CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;

My question is: If I upload with COPY a datafile which is already
correctly sorted, can I inform postgres of such order, so that no
clustering is necessary after the import? In other words, how can I
tell postgres that my file is order by this and that column?

Many thanks!

Dario

Dr. Dario Beraldi
Institute of Evolutionary Biology
University of Edinburgh
West Mains Road
Edinburgh EH9 3JT
Scotland, UK

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Andy Colson

andy@squeakycode.net

about 15 years ago

In reply to: Dario Beraldi (#1)

Re: Cluster table and order information

On 2/1/2011 9:08 AM, Dario Beraldi wrote:

Hello,

From the documentation of CLUSTER table
(http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I
understand that clustering can be achieved by re-creating the table like
this:

CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;

My question is: If I upload with COPY a datafile which is already
correctly sorted, can I inform postgres of such order, so that no
clustering is necessary after the import? In other words, how can I tell
postgres that my file is order by this and that column?

Many thanks!

Dario

The planner has no knowledge of cluster. Meaning PG will query a
clustered and unclustered table exactly the same way. A table is not
marked or anything as clustered. And in fact, during usage of a table
it'll become unclustered.

Clustering is only useful when you are going to read multiple records in
the same order as an index. It turns "more random seeks" into "more
sequential reads".

If your COPY loads data in indexed order, then just dont run the cluster.

-Andy

Dario Beraldi

dario.beraldi@ed.ac.uk

about 15 years ago

In reply to: Andy Colson (#2)

Re: Cluster table and order information

Quoting Andy Colson <andy@squeakycode.net>:

On 2/1/2011 9:08 AM, Dario Beraldi wrote:

Hello,

From the documentation of CLUSTER table
(http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I
understand that clustering can be achieved by re-creating the table like
this:

CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;

My question is: If I upload with COPY a datafile which is already
correctly sorted, can I inform postgres of such order, so that no
clustering is necessary after the import? In other words, how can I tell
postgres that my file is order by this and that column?

Many thanks!

Dario

The planner has no knowledge of cluster. Meaning PG will query a
clustered and unclustered table exactly the same way. A table is
not marked or anything as clustered. And in fact, during usage of a
table it'll become unclustered.

Clustering is only useful when you are going to read multiple
records in the same order as an index. It turns "more random
seeks" into "more sequential reads".

If your COPY loads data in indexed order, then just dont run the cluster.

-Andy

Thanks very much Andy, this clarifies my doubts.

I was misled by the docs saying "When a table is clustered, PostgreSQL
remembers which index it was clustered by" which made me think that
the order information is stored somewhere.

All the best
Dario

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Andy Colson

andy@squeakycode.net

about 15 years ago

In reply to: Dario Beraldi (#3)

Re: Cluster table and order information

On 2/1/2011 10:17 AM, Dario Beraldi wrote:

Quoting Andy Colson <andy@squeakycode.net>:

On 2/1/2011 9:08 AM, Dario Beraldi wrote:

Hello,

From the documentation of CLUSTER table
(http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I
understand that clustering can be achieved by re-creating the table like
this:

CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;

My question is: If I upload with COPY a datafile which is already
correctly sorted, can I inform postgres of such order, so that no
clustering is necessary after the import? In other words, how can I tell
postgres that my file is order by this and that column?

Many thanks!

Dario

The planner has no knowledge of cluster. Meaning PG will query a
clustered and unclustered table exactly the same way. A table is not
marked or anything as clustered. And in fact, during usage of a table
it'll become unclustered.

Clustering is only useful when you are going to read multiple records
in the same order as an index. It turns "more random seeks" into "more
sequential reads".

If your COPY loads data in indexed order, then just dont run the cluster.

-Andy

Thanks very much Andy, this clarifies my doubts.

I was misled by the docs saying "When a table is clustered, PostgreSQL
remembers which index it was clustered by" which made me think that the
order information is stored somewhere.

All the best
Dario

The next sentience clears it up:

The form "CLUSTER table_name" reclusters the table using the same index
as before.

-Andy