Updating pg_attribute to change field's data type from integer to bigint on very large table

Started by Jeff Adamsalmost 14 years ago10 messagesgeneral
Jump to latest
#1Jeff Adams
Jeff.Adams@noaa.gov

Greetings,

I have a very large table (approximately 1 billion records). I need to
change a field's data type from integer to bigint. I started up an ALTER
TABLE approach yesterday and it is still running (trying to rewrite the
whole table?). I remember seeing mention of being able to do this in the
pg_attribute table. Has anybody ever used the latter approach. If so,
instructions would be greatly appreciated. Thanks...

Jeff

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Adams (#1)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

Jeff Adams <jeff.adams@noaa.gov> writes:

I have a very large table (approximately 1 billion records). I need to
change a field's data type from integer to bigint. I started up an ALTER
TABLE approach yesterday and it is still running (trying to rewrite the
whole table?). I remember seeing mention of being able to do this in the
pg_attribute table. Has anybody ever used the latter approach. If so,
instructions would be greatly appreciated. Thanks...

No, that is an actual on-disk change (making the field physically
wider), so it's going to cost ya. There are some cases where the
on-disk representation doesn't change and so a catalog update isn't
needed, but int4->int8 isn't one of them.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

I wrote:

No, that is an actual on-disk change (making the field physically
wider), so it's going to cost ya. There are some cases where the
on-disk representation doesn't change and so a catalog update isn't
needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that. Of course I meant "and so a
catalog update is all that's needed".

regards, tom lane

#4Jeff Adams
Jeff.Adams@noaa.gov
In reply to: Tom Lane (#3)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

so i can? if so, how do i go about? i should mention that, while i dabble
in postgres dba activity, it is not my day job...

On Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

No, that is an actual on-disk change (making the field physically
wider), so it's going to cost ya. There are some cases where the
on-disk representation doesn't change and so a catalog update isn't
needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that. Of course I meant "and so a
catalog update is all that's needed".

regards, tom lane

--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376

#5Bèrto ëd Sèra
berto.d.sera@gmail.com
In reply to: Jeff Adams (#4)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

Hi Jeff,

read carefully, he said it does stuff ON DISK, which means that all of your
fields must be physically remade (so it a lot more than telling it "it's an
int4, Sheila", which is what just modifying the catalog would do). Since
you are not a dba be aware that you are producing a new record for each of
the existing ones, and thus bloating things quite a bit. You may want to
vacuum it asap :)

Bèrto

On 13 April 2012 16:30, Jeff Adams <jeff.adams@noaa.gov> wrote:

so i can? if so, how do i go about? i should mention that, while i dabble
in postgres dba activity, it is not my day job...

On Fri, Apr 13, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

No, that is an actual on-disk change (making the field physically
wider), so it's going to cost ya. There are some cases where the
on-disk representation doesn't change and so a catalog update isn't
needed, but int4->int8 isn't one of them.

Sheesh, hit send too quickly on that. Of course I meant "and so a
catalog update is all that's needed".

regards, tom lane

--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.

#6Steve Crawford
scrawford@pinpointresearch.com
In reply to: Jeff Adams (#4)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

On 04/13/2012 08:30 AM, Jeff Adams wrote:

so i can? if so, how do i go about? i should mention that, while i
dabble in postgres dba activity, it is not my day job...

That really depends on details and your concerns. Is the database used
for constant insert/update/select activity or is it a big table used for
analysis and can be taken offline for some period? Is the column you
want to update a primary or foreign key? How much available disk space
do you have? Is a large portion of the data static (historical logs)?

Some possible approaches:

1. Just let it run to completion if you can afford the maintenance time.

2. Add a new column of the appropriate type, copy the data into that
column then drop the old one and rename the new one. If you do the
update all at once you will have severe table bloat but you may be able
to do the updates of the new column in batches so that vacuum can
reclaim space between update batches. This approach may be useful if you
do not have enough maintenance time to do the change all at once.

3. Dump the table data. Truncate the table and modify the column
definition. Restore the data. This requires downtime but will probably
be faster than in-place modification. However it's not something that
you can easily cancel part-way through and not a friendly method if
there are foreign-keys involved.

4. Rename the table and create a new table with the structure you want.
Copy the old data back into the new table - perhaps in batches. This
might be useful if you need to constantly keep collecting data but can
afford a delay in analysis of the data.

If partitioning the table would be beneficial, this might be a good time
to consider that as well.

Cheers,
Steve

#7Jeff Adams
Jeff.Adams@noaa.gov
In reply to: Steve Crawford (#6)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

Thanks for the ideas Steve. I am actually working with a partitioned table
and the field I am modifying is the id field (I have reached the cap on the
integer data type and need to modify it to bigint - very poor planning on
my part!), but no related tables exist. The id field in the partitioned
tables is inherited, so I figured I needed to alter the column in the
parent table. Does this information point towards an optimal solution?
Thanks again...

On Fri, Apr 13, 2012 at 12:31 PM, Steve Crawford <
scrawford@pinpointresearch.com> wrote:

On 04/13/2012 08:30 AM, Jeff Adams wrote:

so i can? if so, how do i go about? i should mention that, while i dabble
in postgres dba activity, it is not my day job...

That really depends on details and your concerns. Is the database used

for constant insert/update/select activity or is it a big table used for
analysis and can be taken offline for some period? Is the column you want
to update a primary or foreign key? How much available disk space do you
have? Is a large portion of the data static (historical logs)?

Some possible approaches:

1. Just let it run to completion if you can afford the maintenance time.

2. Add a new column of the appropriate type, copy the data into that
column then drop the old one and rename the new one. If you do the update
all at once you will have severe table bloat but you may be able to do the
updates of the new column in batches so that vacuum can reclaim space
between update batches. This approach may be useful if you do not have
enough maintenance time to do the change all at once.

3. Dump the table data. Truncate the table and modify the column
definition. Restore the data. This requires downtime but will probably be
faster than in-place modification. However it's not something that you can
easily cancel part-way through and not a friendly method if there are
foreign-keys involved.

4. Rename the table and create a new table with the structure you want.
Copy the old data back into the new table - perhaps in batches. This might
be useful if you need to constantly keep collecting data but can afford a
delay in analysis of the data.

If partitioning the table would be beneficial, this might be a good time
to consider that as well.

Cheers,
Steve

--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376

#8Alban Hertroys
haramrae@gmail.com
In reply to: Jeff Adams (#7)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

On 13 Apr 2012, at 18:43, Jeff Adams wrote:

Thanks for the ideas Steve. I am actually working with a partitioned table and the field I am modifying is the id field (I have reached the cap on the integer data type and need to modify it to bigint - very poor planning on my part!), but no related tables exist. The id field in the partitioned tables is inherited, so I figured I needed to alter the column in the parent table. Does this information point towards an optimal solution? Thanks again...

The optimal solution most likely involves the table rewrite that you're trying to prevent.

Do you really need that id column though? It seems like it's a surrogate key - perhaps the real primary key for that data is a combination of some other columns? You say you partitioned that table, so the column you partitioned on must be part of that "new" PK in that case. That won't work if you depend on that id-column for other things, of course, or if it's really the only thing that guarantees that your records are unique (then again, are duplicates useful in any way?).

...If that doesn't work either, I just got a really dangerous idea for an alternate solution: I was just wondering whether you might have a field with 4 bytes of spare length _before_ the id column?
If you alter the pg_attributes, reducing that field with 4 bytes, and increase the id-column to bigint (+4 bytes), you might end up with a working solution.

The id's in the "resized" field would probably be really weird though and quite likely invalid for the data-type (those last 4 bytes you'd borrow of the previous column better be empty!). Plus, I'm assuming there's nothing between those fields on disk - in reality there's padding going on and such. As I said, it's a dangerous idea!
It will probably get shot down by people with more knowledge about the on-disk data structure right away. For example, if there's any kind of pointer to the start of your id-column, well... Also, it would probably kill any kind of replication on this table and will likely cause mayhem in the WAL logs as well, so you'd at least have to make sure there are no transactions going on (including autovacuum).
Still, if it works in your case then it's a possible solution :P

If you're going to experiment with this, you should really try it on a separate database (preferably on a different server) with some test data first. There's definitely a risk of losing everything in the database. It's probably a _really_ bad idea :P

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

#9Jasen Betts
jasen@xnet.co.nz
In reply to: Jeff Adams (#1)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

On 2012-04-13, Jeff Adams <jeff.adams@noaa.gov> wrote:

--e89a8ff1c1e8d37deb04bd922a84
Content-Type: text/plain; charset=ISO-8859-1

Thanks for the ideas Steve. I am actually working with a partitioned table
and the field I am modifying is the id field (I have reached the cap on the
integer data type and need to modify it to bigint - very poor planning on
my part!), but no related tables exist. The id field in the partitioned
tables is inherited, so I figured I needed to alter the column in the
parent table. Does this information point towards an optimal solution?
Thanks again...

you can drop the inheritance and alter the tables one at a time, I did
this on a partitioned table with only millions of rows when I saw
what direction it was going in.

As my data was partitioned by time the system had to run without some
of the older data for a while while my script altered and re-inherited
each partition.

--
⚂⚃ 100% natural

#10Jeff Adams
Jeff.Adams@noaa.gov
In reply to: Alban Hertroys (#8)
Re: Updating pg_attribute to change field's data type from integer to bigint on very large table

Thanks Alban. Perhaps I will go with the rewrite. I use the table for
storage and analysis purposes only and am the only user, so I can take the
table offline, I was wondering whether running the ALTER TABLE on the
parent table makes the change in the inherited child tables? The fact that
my initial stab at the ALTER TABLE lasted 14 hours or so before I cancelled
suggests that it was working on the child tables?

On Sat, Apr 14, 2012 at 6:29 AM, Alban Hertroys <haramrae@gmail.com> wrote:

On 13 Apr 2012, at 18:43, Jeff Adams wrote:

Thanks for the ideas Steve. I am actually working with a partitioned

table and the field I am modifying is the id field (I have reached the cap
on the integer data type and need to modify it to bigint - very poor
planning on my part!), but no related tables exist. The id field in the
partitioned tables is inherited, so I figured I needed to alter the column
in the parent table. Does this information point towards an optimal
solution? Thanks again...

The optimal solution most likely involves the table rewrite that you're
trying to prevent.

Do you really need that id column though? It seems like it's a surrogate
key - perhaps the real primary key for that data is a combination of some
other columns? You say you partitioned that table, so the column you
partitioned on must be part of that "new" PK in that case. That won't work
if you depend on that id-column for other things, of course, or if it's
really the only thing that guarantees that your records are unique (then
again, are duplicates useful in any way?).

...If that doesn't work either, I just got a really dangerous idea for an
alternate solution: I was just wondering whether you might have a field
with 4 bytes of spare length _before_ the id column?
If you alter the pg_attributes, reducing that field with 4 bytes, and
increase the id-column to bigint (+4 bytes), you might end up with a
working solution.

The id's in the "resized" field would probably be really weird though and
quite likely invalid for the data-type (those last 4 bytes you'd borrow of
the previous column better be empty!). Plus, I'm assuming there's nothing
between those fields on disk - in reality there's padding going on and
such. As I said, it's a dangerous idea!
It will probably get shot down by people with more knowledge about the
on-disk data structure right away. For example, if there's any kind of
pointer to the start of your id-column, well... Also, it would probably
kill any kind of replication on this table and will likely cause mayhem in
the WAL logs as well, so you'd at least have to make sure there are no
transactions going on (including autovacuum).
Still, if it works in your case then it's a possible solution :P

If you're going to experiment with this, you should really try it on a
separate database (preferably on a different server) with some test data
first. There's definitely a risk of losing everything in the database. It's
probably a _really_ bad idea :P

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

--
Jeffrey D. Adams
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376