Time to move table to new tablespace

Started by Jason Buberelover 14 years ago6 messagesgeneral

jason@altosresearch.com

over 14 years ago

I'm considering the migration of an existing large (2.3TB) table to a new
tablespace. The table size, according to the '\dt+' command:

Are there any considerations - besides the usual disk and network IO
constraints - that I need to take into account when estimating the amount
of time that would be required for the following commands to complete?

psql> create tablespace 'newstorage' location '/some/new/path';
psql> alter table city_summary set tablespace = 'newstorage';

Any other recommendations are welcome, such as "You are an idiot for not
using partitioning in the first place" :)

--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

Craig Ringer

craig@2ndquadrant.com

about 14 years ago

In reply to: Jason Buberel (#1)

Re: Time to move table to new tablespace

On 7/01/2012 10:52 PM, Jason Buberel wrote:

I'm considering the migration of an existing large (2.3TB) table to a
new tablespace. The table size, according to the '\dt+' command:

public | city_summary | table | altosresearch | 2345 GB |

Are there any considerations - besides the usual disk and network IO
constraints - that I need to take into account when estimating the
amount of time that would be required for the following commands to
complete?

psql> create tablespace 'newstorage' location '/some/new/path';
psql> alter table city_summary set tablespace = 'newstorage';

Any other recommendations are welcome, such as "You are an idiot for
not using partitioning in the first place" :)

Maybe you should create a set of partitioned tables in the new
tablespace, copy your data over, then drop the old table and rename the
partition to the old table's name instead? If the big table keeps on
accumulating data (via INSERTs) you can add a trigger that mirrors all
updates to the partition while the copy runs.

This won't work so well if the big table has UPDATEs and DELETEs too,
since you can't delete or update records from a trigger before they've
been copied to the new table and committed.

--
Craig Ringer

Ondrej Ivanič

ondrej.ivanic@gmail.com

about 14 years ago

In reply to: Jason Buberel (#1)

Re: Time to move table to new tablespace

Hi,

On 8 January 2012 01:52, Jason Buberel <jason@altosresearch.com> wrote:

psql> create tablespace 'newstorage' location '/some/new/path';
psql> alter table city_summary set tablespace = 'newstorage';

Be aware that you are not going to move indexes (see ALTER INDEX name
SET TABLESPACE tablespace_name). Maybe you don't have to move data and
I would be worth to move indexes only. So you ended up with data and
index tablespace and reduce random_page_cost when you query your
table.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Jason Buberel

jason@altosresearch.com

about 14 years ago

In reply to: Craig Ringer (#2)

Re: Time to move table to new tablespace

Craig,

Good suggestion - I have experimented with table partitioning in the past,
but the impact on performance was considerable.

In our case, we have one large table with real estate statistics arranged
in time (weekly for the last five years) and geography (every zip in the
US). You could imagine a date-range partition strategy (each year is a
separate partition, for example) or a spatial (10 partitioned tables based
on the modulus 10 of ZIP code, perhaps) approach.

However, our query patterns span both data dimensions:

"Select median price for every zip code as of 2012-01-06" (customer exports)
"Select median price for 94086 from 2005-01-01 through 2012-01-06"
(charting apps)

So by partitioning in one dimension we impact queries in the other.

If you have a brilliant solution to problems like this, we'd be happy to
hire you (not a joke).

Cheers,
Jason

On Sun, Jan 8, 2012 at 5:12 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:

On 7/01/2012 10:52 PM, Jason Buberel wrote:

I'm considering the migration of an existing large (2.3TB) table to a new
tablespace. The table size, according to the '\dt+' command:

public | city_summary | table | altosresearch | 2345 GB |

Are there any considerations - besides the usual disk and network IO
constraints - that I need to take into account when estimating the amount
of time that would be required for the following commands to complete?

psql> create tablespace 'newstorage' location '/some/new/path';
psql> alter table city_summary set tablespace = 'newstorage';

Any other recommendations are welcome, such as "You are an idiot for not
using partitioning in the first place" :)

Maybe you should create a set of partitioned tables in the new tablespace,
copy your data over, then drop the old table and rename the partition to
the old table's name instead? If the big table keeps on accumulating data
(via INSERTs) you can add a trigger that mirrors all updates to the
partition while the copy runs.

This won't work so well if the big table has UPDATEs and DELETEs too,
since you can't delete or update records from a trigger before they've been
copied to the new table and committed.

--
Craig Ringer

--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907

Ondrej Ivanič

ondrej.ivanic@gmail.com

about 14 years ago

In reply to: Jason Buberel (#4)

Re: Time to move table to new tablespace

Hi,

On 10 January 2012 06:10, Jason Buberel <jason@altosresearch.com> wrote:

"Select median price for every zip code as of 2012-01-06" (customer exports)
"Select median price for 94086 from 2005-01-01 through 2012-01-06" (charting
apps)

So by partitioning in one dimension we impact queries in the other.

I do not see any issue here; if all your queries contain date range
then you should partition by date.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Craig Ringer

craig@2ndquadrant.com

about 14 years ago

In reply to: Jason Buberel (#4)

Re: Time to move table to new tablespace

On 10/01/2012 3:10 AM, Jason Buberel wrote:

"Select median price for every zip code as of 2012-01-06" (customer
exports)
"Select median price for 94086 from 2005-01-01 through 2012-01-06"
(charting apps)

So by partitioning in one dimension we impact queries in the other.

Well, I know of *theoretical* solutions that use n-dimensional
partitioning of data, but Pg doesn't have automatic and
easy-to-administrate 1-dimensional partitioning, so I think going for 2D
is a bit of stretch ;-) .

I wonder how hard it'd be to extend constraint exclusion to let it use
different constraints for different queries, or even combine
constraints? The hard bit would be making it efficient. Even if the
answer is "not very", the hard bit would be maintaining a 2D (or more)
partitioned table structure. That, and making constraint exclusion
*efficient* over that many tables.

If the data being partitioned by is not only ordinal but interval too,
then it'd be possible to define regular intervals for partitioning and
build a bitmap for matching tables. That'd be a fun honours / masters
project, but hardly something suitable for trying to hack into a
production setup.

--
Craig Ringer