Update big table

Started by Haiming Zhangover 12 years ago6 messagesgeneral
Jump to latest
#1Haiming Zhang
Haiming.Zhang@redflex.com.au

Hi All,

I am using postgres 9.1, I have a question about updating big table. Here is the basic information of this table.
1. This table has one primary key and other 11 columns.
2. It also has a trigger that before update records, another table got updated first.
3. The has millions of records now.
4. I am not allowed to delete records in this table when UPDATE
The problem is when I do a "Update" query it takes a long time to execute. Eg. when I run query like this " update TABLE set column1 = true where EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the whole table. In order to optimize the update speed. I tried the following strategies:
1. create index based on primary key, column1 and combination of primary key and column1.
2. Alter FILLFACTOR = 70, vacuum all and then reindex
3. drop trigger before update
Then I use "EXPLAIN" to estimate query plan, all of the above strategies do not improve the UPDATE speed dramatically.

Please comments on my three strategies (eg, does I index too many columns in 1?) and please advise me how to improve the update speed. Any advice is welcomed. I appreciate all you help.

Thanks,

Regards,
Haiming

________________________________
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

#2Vincenzo Romano
vincenzo.romano@notorand.it
In reply to: Haiming Zhang (#1)
Re: Update big table

2013/7/14 Haiming Zhang <Haiming.Zhang@redflex.com.au>:

Hi All,

I am using postgres 9.1, I have a question about updating big table. Here is
the basic information of this table.

1. This table has one primary key and other 11 columns.

2. It also has a trigger that before update records,
another table got updated first.

3. The has millions of records now.
4. I am not allowed to delete records in this table when
UPDATE

The problem is when I do a "Update" query it takes a long time to execute.
Eg. when I run query like this " update TABLE set column1 = true where
EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the
whole table. In order to optimize the update speed. I tried the following
strategies:

1. create index based on primary key, column1 and
combination of primary key and column1.

2. Alter FILLFACTOR = 70, vacuum all and then reindex

3. drop trigger before update

Then I use "EXPLAIN" to estimate query plan, all of the above strategies do
not improve the UPDATE speed dramatically.

Please comments on my three strategies (eg, does I index too many columns in
1?) and please advise me how to improve the update speed. Any advice is
welcomed. I appreciate all you help.

Thanks,

Regards,

Haiming

A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3BladeOfLight16
bladeoflight16@gmail.com
In reply to: Vincenzo Romano (#2)
Re: Update big table

I don't believe you can use JOIN explicitly in this situation because it's
an UPDATE, but I believe you can accomplish the same effect with FROM and
WHERE.

UPDATE table SET column1 = TRUE
FROM table2
WHERE table1.event_id = table2.event_id;

I would make sure there's an index on table2.event_id if it's not the PK.
If it's part of a composite key, either make sure that table2.event_id is
the first column in the index (PostgreSQL will use an index like that;
right?) or create an index where it is.

(Sorry. Forgot to hit Reply All.)

On Sun, Jul 14, 2013 at 7:03 AM, Vincenzo Romano <
vincenzo.romano@notorand.it> wrote:

Show quoted text

2013/7/14 Haiming Zhang <Haiming.Zhang@redflex.com.au>:

Hi All,

I am using postgres 9.1, I have a question about updating big table.

Here is

the basic information of this table.

1. This table has one primary key and other 11 columns.

2. It also has a trigger that before update records,
another table got updated first.

3. The has millions of records now.
4. I am not allowed to delete records in this table when
UPDATE

The problem is when I do a "Update" query it takes a long time to

execute.

Eg. when I run query like this " update TABLE set column1 = true where
EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update

the

whole table. In order to optimize the update speed. I tried the following
strategies:

1. create index based on primary key, column1 and
combination of primary key and column1.

2. Alter FILLFACTOR = 70, vacuum all and then reindex

3. drop trigger before update

Then I use "EXPLAIN" to estimate query plan, all of the above strategies

do

not improve the UPDATE speed dramatically.

Please comments on my three strategies (eg, does I index too many

columns in

1?) and please advise me how to improve the update speed. Any advice is
welcomed. I appreciate all you help.

Thanks,

Regards,

Haiming

A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Haiming Zhang
Haiming.Zhang@redflex.com.au
In reply to: Vincenzo Romano (#2)
Re: Update big table

Hi,

Thanks for your reply. Yes you are right, IN predicate is the cause but the JOIN does not help much. I run my query using JOIN for two hours, and did not get it done. Here is my query:

update table1 set col1 = true from table2 where table1.event_id = table2.event_id

Regards,
Haiming

-----Original Message-----
From: Vincenzo Romano [mailto:vincenzo.romano@notorand.it]
Sent: Sunday, 14 July 2013 9:03 PM
To: Haiming Zhang
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Update big table

2013/7/14 Haiming Zhang <Haiming.Zhang@redflex.com.au>:

Hi All,

I am using postgres 9.1, I have a question about updating big table.
Here is the basic information of this table.

1. This table has one primary key and other 11 columns.

2. It also has a trigger that before update records,
another table got updated first.

3. The has millions of records now.
4. I am not allowed to delete records in this table
when UPDATE

The problem is when I do a "Update" query it takes a long time to execute.
Eg. when I run query like this " update TABLE set column1 = true where
EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update
the whole table. In order to optimize the update speed. I tried the
following
strategies:

1. create index based on primary key, column1 and
combination of primary key and column1.

2. Alter FILLFACTOR = 70, vacuum all and then reindex

3. drop trigger before update

Then I use "EXPLAIN" to estimate query plan, all of the above
strategies do not improve the UPDATE speed dramatically.

Please comments on my three strategies (eg, does I index too many
columns in
1?) and please advise me how to improve the update speed. Any advice
is welcomed. I appreciate all you help.

Thanks,

Regards,

Haiming

A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5bricklen
bricklen@gmail.com
In reply to: Haiming Zhang (#4)
Re: Update big table

On Mon, Jul 15, 2013 at 6:08 AM, Haiming Zhang <Haiming.Zhang@redflex.com.au

wrote:

I run my query using JOIN for two hours, and did not get it done. Here is
my query:

update table1 set col1 = true from table2 where table1.event_id =
table2.event_id

Did you already post the query plan from "EXPLAIN update table1 set col1 =
true from table2 where table1.event_id = table2.event_id" ?

#6Haiming Zhang
Haiming.Zhang@redflex.com.au
In reply to: BladeOfLight16 (#3)
Re: Update big table

Thanks.

I have fixed the problem. And now the query can run in 4 mins for 1.5 million records in TABLE1 and 0.5 million records in TABLE2.

The solution is I created a function that gets all event_id from TABLE2 then travel through each event_id and do the update for TABLE1. This removes the side effects of using IN predicate. Also I have
1. set FILLFACTOR = 70
2. remove trigger and index before the function, create index based on event_id
3. add trigger and index back after run function.

Regards,
Haiming

From: BladeOfLight16 [mailto:bladeoflight16@gmail.com]
Sent: Monday, 15 July 2013 5:26 AM
To: pgsql-general@postgresql.org; Haiming Zhang
Subject: Re: [GENERAL] Update big table

I don't believe you can use JOIN explicitly in this situation because it's an UPDATE, but I believe you can accomplish the same effect with FROM and WHERE.

UPDATE table SET column1 = TRUE
FROM table2
WHERE table1.event_id = table2.event_id;

I would make sure there's an index on table2.event_id if it's not the PK. If it's part of a composite key, either make sure that table2.event_id is the first column in the index (PostgreSQL will use an index like that; right?) or create an index where it is.

(Sorry. Forgot to hit Reply All.)
On Sun, Jul 14, 2013 at 7:03 AM, Vincenzo Romano <vincenzo.romano@notorand.it<mailto:vincenzo.romano@notorand.it>> wrote:
2013/7/14 Haiming Zhang <Haiming.Zhang@redflex.com.au<mailto:Haiming.Zhang@redflex.com.au>>:

Hi All,

I am using postgres 9.1, I have a question about updating big table. Here is
the basic information of this table.

1. This table has one primary key and other 11 columns.

2. It also has a trigger that before update records,
another table got updated first.

3. The has millions of records now.
4. I am not allowed to delete records in this table when
UPDATE

The problem is when I do a "Update" query it takes a long time to execute.
Eg. when I run query like this " update TABLE set column1 = true where
EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the
whole table. In order to optimize the update speed. I tried the following
strategies:

1. create index based on primary key, column1 and
combination of primary key and column1.

2. Alter FILLFACTOR = 70, vacuum all and then reindex

3. drop trigger before update

Then I use "EXPLAIN" to estimate query plan, all of the above strategies do
not improve the UPDATE speed dramatically.

Please comments on my three strategies (eg, does I index too many columns in
1?) and please advise me how to improve the update speed. Any advice is
welcomed. I appreciate all you help.

Thanks,

Regards,

Haiming

A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

________________________________
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.