Alter the column data type of the large data volume table.

Started by charles mengover 5 years ago16 messagesgeneral

xlyybz@gmail.com

over 5 years ago

Hi all,

I have a table with 1.6 billion records. The data type of the primary key
column is incorrectly used as integer. I need to replace the type of the
column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original table
data to the temporary table

Thanks in advance.

Michael Lewis

mlewis@entrata.com

over 5 years ago

In reply to: charles meng (#1)

Re: Alter the column data type of the large data volume table.

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

Hi all,

I have a table with 1.6 billion records. The data type of the primary key
column is incorrectly used as integer. I need to replace the type of the
column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original
table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and have
it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to set the
new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Rich Shepard

rshepard@appl-ecosys.com

over 5 years ago

In reply to: Michael Lewis (#2)

Re: Alter the column data type of the large data volume table.

On Thu, 3 Dec 2020, Michael Lewis wrote:

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

I have a table with 1.6 billion records. The data type of the primary key
column is incorrectly used as integer. I need to replace the type of the
column with bigint. Is there any ideas for this?

You can add a new column with NO default value and null as default and have
it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to set the
new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used the
above to increase their size. Worked perfectly.

Regards,

Rich

Michael Lewis

mlewis@entrata.com

over 5 years ago

In reply to: Rich Shepard (#3)

Re: Alter the column data type of the large data volume table.

On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com>
wrote:

On Thu, 3 Dec 2020, Michael Lewis wrote:

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

I have a table with 1.6 billion records. The data type of the primary

key

column is incorrectly used as integer. I need to replace the type of the
column with bigint. Is there any ideas for this?

You can add a new column with NO default value and null as default and

have

it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to set

the

new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used
the
above to increase their size. Worked perfectly.

Regards,

Rich

Afaik, it will require an access exclusive lock for the entire time it
takes to re-write the 1.6 billion rows and update all indexes. That sort of
lock out time doesn't seem workable in many production systems.

Rich Shepard

rshepard@appl-ecosys.com

over 5 years ago

In reply to: Michael Lewis (#4)

Re: Alter the column data type of the large data volume table.

On Thu, 3 Dec 2020, Michael Lewis wrote:

Afaik, it will require an access exclusive lock for the entire time it
takes to re-write the 1.6 billion rows and update all indexes. That sort
of lock out time doesn't seem workable in many production systems.

Michael,

Okay. I hadn't thought of that.

Stay well,

Rich

Ron

ronljohnsonjr@gmail.com

over 5 years ago

In reply to: Michael Lewis (#4)

Re: Alter the column data type of the large data volume table.

On 12/3/20 11:26 AM, Michael Lewis wrote:

On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com
<mailto:rshepard@appl-ecosys.com>> wrote:

On Thu, 3 Dec 2020, Michael Lewis wrote:

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com

<mailto:xlyybz@gmail.com>> wrote:

I have a table with 1.6 billion records. The data type of the

primary key

column is incorrectly used as integer. I need to replace the type

of the

column with bigint. Is there any ideas for this?

You can add a new column with NO default value and null as default

and have

it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to

set the

new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and
used the
above to increase their size. Worked perfectly.

Regards,

Rich

Afaik, it will require an access exclusive lock for the entire time it
takes to re-write the 1.6 billion rows and update all indexes. That sort
of lock out time doesn't seem workable in many production systems.

Yet another argument for partitioning!

1. Split split all the partitions from the main table,
2. drop the PK,
3. do all the ALTER statements in parallel,
4. recreate the PK indices, then
5. join them back to the main table.

Not instant, but faster than updating 1.6Bn rows in one single giant statement.

(Of course, that doesn't help OP with his current problem.)

--
Angular momentum makes the world go 'round.

Michael Lewis

mlewis@entrata.com

over 5 years ago

In reply to: Rich Shepard (#3)

Re: Alter the column data type of the large data volume table.

On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com>
wrote:

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and used
the
above to increase their size. Worked perfectly.

Something else noteworthy is that with varchar, there is no rewrite of the
table. You are just removing or loosening the length restriction on a
variable width column type. I believe you could change all columns from
VARCHAR(n) to TEXT or VARCHAR(n+X) and have it take almost no time at all
since you are only impacting the catalog tables (pretty sure about that at
least). With a fixed width column like int4 to int8, all the rows need to
be actually re-written.

Ron

ronljohnsonjr@gmail.com

over 5 years ago

In reply to: Michael Lewis (#7)

Re: Alter the column data type of the large data volume table.

On 12/3/20 11:53 AM, Michael Lewis wrote:

On Thu, Dec 3, 2020 at 10:18 AM Rich Shepard <rshepard@appl-ecosys.com
<mailto:rshepard@appl-ecosys.com>> wrote:

Tell me, please, why

ALTER TABLE <tablename> ALTER COLUMN <columnname> SET DATA TYPE BIGINT

will not do the job?

I've found some varchar columns in a couple of tables too small and
used the
above to increase their size. Worked perfectly.

Something else noteworthy is that with varchar, there is no rewrite of the
table. You are just removing or loosening the length restriction on a
variable width column type. I believe you could change all columns from
VARCHAR(n) to TEXT or VARCHAR(n+X) and have it take almost no time at all
since you are only impacting the catalog tables (pretty sure about that at
least). With a fixed width column like int4 to int8, all the rows need to
be actually re-written.

And in this case it's the PK, so indexed and thus even slower. Lots slowe.

--
Angular momentum makes the world go 'round.

charles meng

xlyybz@gmail.com

over 5 years ago

In reply to: charles meng (#1)

Re: Alter the column data type of the large data volume table.

What I mean is that it will take a lot of time to complete all data
processing.I have to say that it is a good solution to adjust the column
type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for
MySQL...

https://github.com/github/gh-ost

regards.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五下午1:04写道：

Show quoted text

Please do reply all.

Not sure what you mean about it taking too much time. It's rewriting a
bunch of data. It's going to take a while. The best you can do is break the
work up into small pieces and commit each piece.

On Thu, Dec 3, 2020, 7:11 PM charles meng <xlyybz@gmail.com> wrote:

Thanks for your help, I think the first method I tried (adding temporary
column) is similar to what you said, but it takes too much time for me.

Thanks again.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五上午1:11写道：

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

Hi all,

I have a table with 1.6 billion records. The data type of the primary
key column is incorrectly used as integer. I need to replace the type of
the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original
table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and
have it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to set the
new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Import Notes

Reply to msg id not found: CAHOFxGoupiQDh4BSQnN=QKGJh=ZhDLZ3og9i1bJ8AQsCTQAjtw@mail.gmail.com

#10

Olivier Gautherot

ogautherot@gautherot.net

over 5 years ago

In reply to: charles meng (#9)

Re: Alter the column data type of the large data volume table.

Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com> wrote:

What I mean is that it will take a lot of time to complete all data
processing.I have to say that it is a good solution to adjust the column
type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for
MySQL...

https://github.com/github/gh-ost

MySQL has its own strategy with regard to column handling so what works
there does not necessarily fit here.

There are some good ideas in this thread but I would start with a few
checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to
add a column typed bigint - if it is recent enough it will be a simple
catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested,
progressively in batches of a few tens of thousands. Make sure you commit
and vacuum after each to retrieve the space (or you may end up with a
storage space issue in addition to all the rest). In the meantime, add a
trigger to set the new column to the index value. Once the new column is
complete, drop the old column and set the new one as primary key (it may
take a while to recalculate the index).

4) If your table is still growing, I would definitely look into
partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition
(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

regards.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五下午1:04写道：

Please do reply all.

Not sure what you mean about it taking too much time. It's rewriting a
bunch of data. It's going to take a while. The best you can do is break the
work up into small pieces and commit each piece.

On Thu, Dec 3, 2020, 7:11 PM charles meng <xlyybz@gmail.com> wrote:

Thanks for your help, I think the first method I tried (adding temporary
column) is similar to what you said, but it takes too much time for me.

Thanks again.

Michael Lewis <mlewis@entrata.com> 于2020年12月4日周五上午1:11写道：

On Wed, Dec 2, 2020 at 11:53 PM charles meng <xlyybz@gmail.com> wrote:

Hi all,

I have a table with 1.6 billion records. The data type of the primary
key column is incorrectly used as integer. I need to replace the type of
the column with bigint. Is there any ideas for this?

Solutions that have been tried:
Adding temporary columns was too time-consuming, so I gave up.
Using a temporary table, there is no good way to migrate the original
table data to the temporary table

Thanks in advance.

You can add a new column with NO default value and null as default and
have it be very fast. Then you can gradually update rows in batches (if on
PG11+, perhaps use do script with a loop to commit after X rows) to set the
new column the same as the primary key. Lastly, in a transaction, update
any new rows where the bigint column is null, and change which column is
the primary key & drop the old one. This should keep each transaction
reasonably sized to not hold up other processes.

Cheers
--
Olivier Gautherot

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

#11

Olivier Gautherot

ogautherot@gautherot.net

over 5 years ago

In reply to: Olivier Gautherot (#10)

Re: Alter the column data type of the large data volume table.

Hi Charles,

On 04-12-2020 9:44, Olivier Gautherot wrote:

Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com
<mailto:xlyybz@gmail.com>> wrote:

What I mean is that it will take a lot of time to complete all
data processing.I have to say that it is a good solution to adjust
the column type without affecting online users.

I found a tool on github, see the link below, unfortunately, this
is for MySQL...

https://github.com/github/gh-ost <https://github.com/github/gh-ost>

MySQL has its own strategy with regard to column handling so what
works there does not necessarily fit here.

There are some good ideas in this thread but I would start with a few
checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in
Production) to add a column typed bigint - if it is recent enough it
will be a simple catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested,
progressively in batches of a few tens of thousands. Make sure you
commit and vacuum after each to retrieve the space (or you may end up
with a storage space issue in addition to all the rest). In the
meantime, add a trigger to set the new column to the index value. Once
the new column is complete, drop the old column and set the new one as
primary key (it may take a while to recalculate the index).

4) If your table is still growing, I would definitely look into
partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition
(rename it to something like table_hist), duplicate the table model
under the same name as now (adjusting the primary key type) and set
the INHERITS on the primary key range. The inheritance should take
care of the type conversion (haven't tried it but it's worth a try).
If it works, you will reach your goal without downtime or significant
overhead.

Sorry, just tried this one and it failed: type mismatch.

Cheers
--
Olivier Gautherot

--
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus

#12

charles meng

xlyybz@gmail.com

over 5 years ago

In reply to: Olivier Gautherot (#11)

Re: Alter the column data type of the large data volume table.

Hi Olivier,

My PG version is 10.
Anyway, thanks a lot for your help.

Best regards.

Olivier Gautherot <ogautherot@gautherot.net> 于2020年12月4日周五下午6:14写道：

Show quoted text

Hi Charles,
On 04-12-2020 9:44, Olivier Gautherot wrote:

Hi Charles,

On Fri, Dec 4, 2020 at 9:12 AM charles meng <xlyybz@gmail.com> wrote:

What I mean is that it will take a lot of time to complete all data
processing.I have to say that it is a good solution to adjust the column
type without affecting online users.

I found a tool on github, see the link below, unfortunately, this is for
MySQL...

https://github.com/github/gh-ost

MySQL has its own strategy with regard to column handling so what works
there does not necessarily fit here.

There are some good ideas in this thread but I would start with a few
checks:

1) What version of PG are you using ?

2) If you can try on your current setup (not necessarily in Production) to
add a column typed bigint - if it is recent enough it will be a simple
catalog update. Michael's suggestion is viable

3) Given the massive number of rows, I would update as suggested,
progressively in batches of a few tens of thousands. Make sure you commit
and vacuum after each to retrieve the space (or you may end up with a
storage space issue in addition to all the rest). In the meantime, add a
trigger to set the new column to the index value. Once the new column is
complete, drop the old column and set the new one as primary key (it may
take a while to recalculate the index).

4) If your table is still growing, I would definitely look into
partitioning as it can hardly make things worse.

5) If you're brave enough, convert your current table as a partition
(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.

Cheers
--
Olivier Gautherot

#13

Kevin Brannen

KBrannen@efji.com

over 5 years ago

In reply to: Olivier Gautherot (#11)

RE: Alter the column data type of the large data volume table.

From: Olivier Gautherot <ogautherot@gautherot.net>

5) If you're brave enough, convert your current table as a partition (rename it to something like table_hist), duplicate the table model under the same name as now (adjusting the primary key type) and set the INHERITS on the primary key range. The inheritance should take care of the type conversion (haven't tried it but it's worth a try). If it works, you will reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.

Seems like a sound idea in general. I’d probably rename the tables, let’s call them “big_hist” for the old big table and “big_split” for the new partitioned table that being used go forward – assuming the original table was called “big”. Then create a View that will look at both of those but call it the same as the old table, and let the view do a type cast on the old key like big_hist.id::bigint so it matches the new type, because the view will probably be a union and the type need to match. That way your application only has to pause long enough to do a few meta-commands then it all can resume, and like Olivier pointed you, you can fix the data by moving it from big_hist to big_split in the background as you have time.

I’d probably put it all in a transaction too:

Create table … -- all the commands to create your patitioned table big_split here
Begin;
Alter table big rename to big_hist;
Create view big select * from big_split union select id::bigint, /* other cols */ from big_hist;
Commit;

Try it on a dev system and if it works you’re off and running. I’d expect the view to slow things down a little, but probably not too much if you have good indexes. But at least you could transition without major downtime and then rename “big_split” back to “big” and drop “big_hist” when you’ve finished the transition. I might even be tempted to add a trigger so that all new inserts into “big” really go into “big_split” so “big_hist” doesn’t grow any more. Your imagination is probably the limit. 😊

HTH,
Kevin
.

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

#14

Michael Lewis

mlewis@entrata.com

over 5 years ago

In reply to: Kevin Brannen (#13)

Re: Alter the column data type of the large data volume table.

On Fri, Dec 4, 2020 at 9:04 AM Kevin Brannen <KBrannen@efji.com> wrote:

*>From:* Olivier Gautherot <ogautherot@gautherot.net>

5) If you're brave enough, convert your current table as a partition

(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.

Seems like a sound idea in general. I’d probably rename the tables, let’s
call them “big_hist” for the old big table and “big_split” for the new
partitioned table that being used go forward – assuming the original table
was called “big”. Then create a View that will look at both of those but
call it the same as the old table, and let the view do a type cast on the
old key like big_hist.id::bigint so it matches the new type, because the
view will probably be a union and the type need to match. That way your
application only has to pause long enough to do a few meta-commands then it
all can resume, and like Olivier pointed you, you can fix the data by
moving it from big_hist to big_split in the background as you have time.

I’d probably put it all in a transaction too:

Create table … -- all the commands to create your patitioned table
big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other
cols */ from big_hist;

Commit;

Try it on a dev system and if it works you’re off and running. I’d expect
the view to slow things down a little, but probably not too much if you
have good indexes. But at least you could transition without major downtime
and then rename “big_split” back to “big” and drop “big_hist” when you’ve
finished the transition. I might even be tempted to add a trigger so that
all new inserts into “big” really go into “big_split” so “big_hist” doesn’t
grow any more. Your imagination is probably the limit. 😊

Yes, "instead of" triggers for insert/update/delete will make the change
transparent to the application side, other than the potential for slowness
while in the process of moving the data and still using the view.

Also, I'd advocate for UNION ALL to avoid the "distinct" work that is
required for plain UNION. I wish ALL were default behavior and "UNION
DISTINCT" was how get that behavior.

If partitioning is going to happen anyway in the near future, now is a
great time to make that happen. Unfortunately, PG13 is required to use
logical replication from a non-partitioned table to a partitioned table so
moving the data still requires some creative work.

#15

Olivier Gautherot

ogautherot@gautherot.net

over 5 years ago

In reply to: Michael Lewis (#14)

Re: Alter the column data type of the large data volume table.

On Fri, Dec 4, 2020 at 5:22 PM Michael Lewis <mlewis@entrata.com> wrote:

On Fri, Dec 4, 2020 at 9:04 AM Kevin Brannen <KBrannen@efji.com> wrote:

*>From:* Olivier Gautherot <ogautherot@gautherot.net>

5) If you're brave enough, convert your current table as a partition

(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.

Seems like a sound idea in general. I’d probably rename the tables, let’s
call them “big_hist” for the old big table and “big_split” for the new
partitioned table that being used go forward – assuming the original table
was called “big”. Then create a View that will look at both of those but
call it the same as the old table, and let the view do a type cast on the
old key like big_hist.id::bigint so it matches the new type, because the
view will probably be a union and the type need to match. That way your
application only has to pause long enough to do a few meta-commands then it
all can resume, and like Olivier pointed you, you can fix the data by
moving it from big_hist to big_split in the background as you have time.

I’d probably put it all in a transaction too:

Create table … -- all the commands to create your patitioned table
big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other
cols */ from big_hist;

Commit;

Try it on a dev system and if it works you’re off and running. I’d expect
the view to slow things down a little, but probably not too much if you
have good indexes. But at least you could transition without major downtime
and then rename “big_split” back to “big” and drop “big_hist” when you’ve
finished the transition. I might even be tempted to add a trigger so that
all new inserts into “big” really go into “big_split” so “big_hist” doesn’t
grow any more. Your imagination is probably the limit. 😊

Yes, "instead of" triggers for insert/update/delete will make the change
transparent to the application side, other than the potential for slowness
while in the process of moving the data and still using the view.

Also, I'd advocate for UNION ALL to avoid the "distinct" work that is
required for plain UNION. I wish ALL were default behavior and "UNION
DISTINCT" was how get that behavior.

If partitioning is going to happen anyway in the near future, now is a
great time to make that happen. Unfortunately, PG13 is required to use
logical replication from a non-partitioned table to a partitioned table so
moving the data still requires some creative work.

Rather than union, use inheritance with a constraint on the primary key:
it will take care of the union transparently for you.
--
Olivier Gautherot

#16

charles meng

xlyybz@gmail.com

over 5 years ago

In reply to: Kevin Brannen (#13)

Re: Alter the column data type of the large data volume table.

Hi Kevin,

This sounds like a good idea, I will work hard on this idea and let you
know the result.

Most appreciated.

Kevin Brannen <KBrannen@efji.com> 于2020年12月5日周六上午12:04写道：

Show quoted text

*>From:* Olivier Gautherot <ogautherot@gautherot.net>

5) If you're brave enough, convert your current table as a partition

(rename it to something like table_hist), duplicate the table model under
the same name as now (adjusting the primary key type) and set the INHERITS
on the primary key range. The inheritance should take care of the type
conversion (haven't tried it but it's worth a try). If it works, you will
reach your goal without downtime or significant overhead.

Sorry, just tried this one and it failed: type mismatch.

Seems like a sound idea in general. I’d probably rename the tables, let’s
call them “big_hist” for the old big table and “big_split” for the new
partitioned table that being used go forward – assuming the original table
was called “big”. Then create a View that will look at both of those but
call it the same as the old table, and let the view do a type cast on the
old key like big_hist.id::bigint so it matches the new type, because the
view will probably be a union and the type need to match. That way your
application only has to pause long enough to do a few meta-commands then it
all can resume, and like Olivier pointed you, you can fix the data by
moving it from big_hist to big_split in the background as you have time.

I’d probably put it all in a transaction too:

Create table … -- all the commands to create your patitioned table
big_split here

Begin;

Alter table big rename to big_hist;

Create view big select * from big_split union select id::bigint, /* other
cols */ from big_hist;

Commit;

Try it on a dev system and if it works you’re off and running. I’d expect
the view to slow things down a little, but probably not too much if you
have good indexes. But at least you could transition without major downtime
and then rename “big_split” back to “big” and drop “big_hist” when you’ve
finished the transition. I might even be tempted to add a trigger so that
all new inserts into “big” really go into “big_split” so “big_hist” doesn’t
grow any more. Your imagination is probably the limit. 😊

HTH,

Kevin

.

This e-mail transmission, and any documents, files or previous e-mail
messages attached to it, may contain confidential information. If you are
not the intended recipient, or a person responsible for delivering it to
the intended recipient, you are hereby notified that any disclosure,
distribution, review, copy or use of any of the information contained in or
attached to this message is STRICTLY PROHIBITED. If you have received this
transmission in error, please immediately notify us by reply e-mail, and
destroy the original transmission and its attachments without reading them
or saving them to disk. Thank you.