How to add columns faster

Started by yudhi sabout 2 years ago6 messagesgeneral
Jump to latest
#1yudhi s
learnerdatabase99@gmail.com

Hello,
We have an application in which the tables will have approx ~200 columns in
some of the big transaction tables when we will be serving all the business
use cases. Currently it has ~100 columns to serve current business use
cases to start with. As the team is following an agile approach , the
attributes will be added gradually with each future release. But the
concern is, the transaction tables will be holding billions of rows (
because of historical data migration from start) and will be range
partitioned and each of those partitions can be as big as ~100GB and full
table size can be in multiple Terabytes.

So our concern was , as the column addition using the traditional "Alter
table" command in postgres looks to be a full table rewrite , it’s going to
take a lot of time and resources with each of such releases for these big
transaction tables. So what is the way to handle such scenarios and do
these column additions in quick time with no/minimal downtime?

Regards
Yudhi

#2Christophe Pettus
xof@thebuild.com
In reply to: yudhi s (#1)
Re: How to add columns faster

On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote:
as the column addition using the traditional "Alter table" command in postgres looks to be a full table rewrite

That's not always (or, really, usually) true. Adding a new column in any recent version of PostgreSQL just alters the system catalogs; it does not rewrite the table. Make sure the new column is either NULL-able, or has a simple DEFAULT expression (specifically, not using a VOLATILE function). Per the documentation:

When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.

Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.

https://www.postgresql.org/docs/current/sql-altertable.html

#3Ron
ronljohnsonjr@gmail.com
In reply to: yudhi s (#1)
Re: How to add columns faster

On Sun, Mar 3, 2024 at 2:06 PM yudhi s <learnerdatabase99@gmail.com> wrote:

Hello,
We have an application in which the tables will have approx ~200 columns
in some of the big transaction tables when we will be serving all the
business use cases. Currently it has ~100 columns to serve current business
use cases to start with. As the team is following an agile approach , the
attributes will be added gradually with each future release. But the
concern is, the transaction tables will be holding billions of rows (
because of historical data migration from start) and will be range
partitioned and each of those partitions can be as big as ~100GB and full
table size can be in multiple Terabytes.

So our concern was , as the column addition using the traditional "Alter
table" command in postgres looks to be a full table rewrite , it’s going to
take a lot of time and resources with each of such releases for these big
transaction tables. So what is the way to handle such scenarios and do
these column additions in quick time with no/minimal downtime?

Normalizing the database design completely eliminates the problem of any
need to rewrite a table, since new attributes are added as new rows in a
different table.

#4yudhi s
learnerdatabase99@gmail.com
In reply to: Christophe Pettus (#2)
Re: How to add columns faster

On Mon, Mar 4, 2024 at 12:43 AM Christophe Pettus <xof@thebuild.com> wrote:

On Mar 3, 2024, at 11:06, yudhi s <learnerdatabase99@gmail.com> wrote:
as the column addition using the traditional "Alter table" command in

postgres looks to be a full table rewrite

That's not always (or, really, usually) true. Adding a new column in any
recent version of PostgreSQL just alters the system catalogs; it does not
rewrite the table. Make sure the new column is either NULL-able, or has a
simple DEFAULT expression (specifically, not using a VOLATILE function).
Per the documentation:

When a column is added with ADD COLUMN and a non-volatile DEFAULT is

specified, the default is evaluated at the time of the statement and the
result stored in the table's metadata. That value will be used for the
column for all existing rows. If no DEFAULT is specified, NULL is used. In
neither case is a rewrite of the table required.

Adding a column with a volatile DEFAULT or changing the type of an

existing column will require the entire table and its indexes to be
rewritten.

https://www.postgresql.org/docs/current/sql-altertable.html

Thanks for the clarification. In case of adding the column as volatile
default (like current_timestamp function as default) or say adding NOT NULL
column with some conditional population of existing values will be a full
table rewrite. In such scenarios, the full table rewrite operation is going
to take a long time , so what will be the fastest way to achieve that
with minimal to no downtime?

Apology if this is dumb one, but considering the partitions in postgres are
as good as different tables, can we add the new column someway at the table
level and add the columns to each of the partitions individually and then
attach or it has to happen at one shot only?

#5Christophe Pettus
xof@thebuild.com
In reply to: yudhi s (#4)
Re: How to add columns faster

On Mar 3, 2024, at 11:40, yudhi s <learnerdatabase99@gmail.com> wrote:
Thanks for the clarification. In case of adding the column as volatile default (like current_timestamp function as default) or say adding NOT NULL column with some conditional population of existing values will be a full table rewrite. In such scenarios, the full table rewrite operation is going to take a long time , so what will be the fastest way to achieve that with minimal to no downtime?

For other situations, you probably have to create the column as NULL-able, and then backfill it in groups of records until the whole column is populated. Remember that dropping the NULL constraint afterwards will require a full table read (although not a rewrite).

#6Christophe Pettus
xof@thebuild.com
In reply to: Christophe Pettus (#5)
Re: How to add columns faster

On Mar 3, 2024, at 12:00, Christophe Pettus <xof@thebuild.com> wrote:
Remember that dropping the NULL constraint afterwards will require a full table read (although not a rewrite).

Sorry, badly put: Adding a NOT NULl constraint afterwards will require a full table read (although not a rewrite).