The fastest way to update thousands of rows in moderately sized table

Started by twoflowerover 10 years ago12 messagesgeneral
Jump to latest
#1twoflower
standa.kurik@gmail.com

Hello,I have a table with 30 million records in which I need to update a
single column for a couple of thousands of rows, let's say 10 000. The new
column value is identical for all matching rows.Doing

update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

takes 45 seconds. I understand that UPDATE is basically an INSERT followed
by DELETE but I was hoping I could do better than that.I found a suggestion
to use a temporary table to speed things up, so now I have this:

create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";

This got me to about 37 seconds. Still pretty slow.The TRANSLATION has an
index and a foreign key constraint on fk_assignment. Removing the constraint
brought very little benefit. Removing the index is probably out of question
as these kind of operations are very frequent and the table itself is used
heavily, including the index.Execution plan:

Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983 width=405)
(actual time=43262.266..43262.266 rows=0 loops=1)
 -> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual
time=0.566..146.084 rows=8920 loops=1)
   -> Seq Scan on temp_segs _source (cost=0.00..218.83
rows=13983 width=22) (actual time=0.457..13.994 rows=8920 loops=1)
   -> Index Scan using "TRANSLATION_pkey" on "TRANSLATION"
_target (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1
loops=8920)
      Index Cond: (id = _source.id)

Planning time: 1.167 ms
Execution time: 43262.577 ms

Is there anything else worth trying? Are these numbers something to be
expected, from your experience?

I have Postgres 9.4, the database is on SSD.

Thank you very much for any suggestions.

Standa

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: twoflower (#1)
Re: The fastest way to update thousands of rows in moderately sized table

On 07/23/2015 01:17 PM, twoflower wrote:

Hello, I have a table with 30 million records in which I need to update
a single column for a couple of thousands of rows, let's say 10 000. The
new column value is identical for all matching rows. Doing

|update "TRANSLATION" set fk_assignment where fk_job = 1000; |

takes 45 seconds. I understand that |UPDATE| is basically an |INSERT|
followed by |DELETE| but I was hoping I could do better than that. I
found a suggestion to use a temporary table to speed things up, so now I
have this:

|create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";

Have you tried wrapping the above in a BEGIN/COMMIT block?

So:

BEGIN;

create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";

COMMIT;

|

This got me to about 37 seconds. Still pretty slow. The |TRANSLATION|
has an index and a foreign key constraint on |fk_assignment|. Removing
the constraint brought very little benefit. Removing the index is
probably out of question as these kind of operations are very frequent
and the table itself is used heavily, including the index. Execution plan:

|Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983
width=405) (actual time=43262.266..43262.266 rows=0 loops=1)
-> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual
time=0.566..146.084 rows=8920 loops=1)
-> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983
width=22) (actual time=0.457..13.994 rows=8920 loops=1)
-> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target
(cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1
loops=8920)
Index Cond: (id = _source.id)

Planning time: 1.167 ms
Execution time: 43262.577 ms
|
Is there anything else worth trying? Are these numbers something to be
expected, from your experience?

I have Postgres 9.4, the database is on SSD.

Thank you very much for any suggestions.

Standa
------------------------------------------------------------------------
View this message in context: The fastest way to update thousands of
rows in moderately sized table
<http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html&gt;
Sent from the PostgreSQL - general mailing list archive
<http://postgresql.nabble.com/PostgreSQL-general-f1843780.html&gt; at
Nabble.com.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3twoflower
standa.kurik@gmail.com
In reply to: Adrian Klaver (#2)
Re: The fastest way to update thousands of rows in moderately sized table

Adrian Klaver-4 wrote

Have you tried wrapping the above in a BEGIN/COMMIT block?

Yes, I am running the tests inside a BEGIN TRANSACTION / ROLLBACK block.

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859148.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#4林士博
lin@repica.co.jp
In reply to: twoflower (#3)
Re: Re: The fastest way to update thousands of rows in moderately sized table

Try creating an index on TRANSLATION fk_job.

From the Execution plan you posted,
->(actual time=43262.266..43262.266 rows=0 loops=1)
actually, there is no row to be updated.

So, if you have a index on TRANSLATION fk_job, the update sql as behind
should be finished within several seconds.
->update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

#5twoflower
standa.kurik@gmail.com
In reply to: 林士博 (#4)
Re: The fastest way to update thousands of rows in moderately sized table

林士博 wrote

Try creating an index on TRANSLATION fk_job.

The index is already there.

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859191.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#6林士博
lin@repica.co.jp
In reply to: twoflower (#5)
Re: Re: The fastest way to update thousands of rows in moderately sized table

Sorry, my mistake.
Execution plan of an update sql seems always get rows=0.

Can you post execution plan of the original update sql.
EXPLAIN (ANALYZE ON, BUFFERS ON) update "TRANSLATION" set
fk_assignmentwhere fk_job = 1000;

#7twoflower
standa.kurik@gmail.com
In reply to: 林士博 (#6)
Re: The fastest way to update thousands of rows in moderately sized table

林士博 wrote

Can you post execution plan of the original update sql.EXPLAIN (ANALYZE
ON, BUFFERS ON) update "TRANSLATION" setfk_assignmentwhere fk_job = 1000;

Here it is:

Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual
time=35091.036..35091.036 rows=0 loops=1)
&nbsp;&nbsp; Buffers: shared hit=74842343 read=7242 dirtied=7513
&nbsp;&nbsp;&nbsp;&nbsp; -> Index Scan using
"TRANSLATION_idx_composite_job_last_revision" on "TRANSLATION"
(cost=0.56..9645.13 rows=3113 width=391) (actual time=0.042..24.147
rows=8920 loops=1)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (fk_job = 59004)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Buffers: shared hit=626

Planning time: 0.362 msExecution time: 35091.192 ms

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859197.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#8林士博
lin@repica.co.jp
In reply to: twoflower (#7)
Re: Re: The fastest way to update thousands of rows in moderately sized table

OK.
In your case, I can not see any reasons that
using a temp table to do joint-update is faster than directly updating.

And from the execution plan, index scan just takes very little time.
Most of the time is doing insert/delete.

As you mentioned, fk_assignmentwhere is updated frequently,
and disabling indexes before updating is not an option you can choose,
try setting fillfactor of this table to a smaller value.

And maybe you can also check the time of inserting 8920 rows to this table.
If it is far less than 35s.
you can consider to do update in this way:
1.create a temp table with all columns of fk_assignmentwhere,
likes
create unlogged table "temp_table" as
select id, ..... , 1000 as fk_job
from "TRANSLATION"
where fk_job = 1000;
2.delete rows from original table.
3.inert rows from temp table to original table.

At the end , may be you can check if postgresql can insert start from the
position of HWM(High Water Mark).

2015-07-24 15:58 GMT+09:00 twoflower <standa.kurik@gmail.com>:

林士博 wrote
Can you post execution plan of the original update sql. EXPLAIN (ANALYZE
ON, BUFFERS ON) update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

Here it is:

Update on "TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual
time=35091.036..35091.036 rows=0 loops=1)
Buffers: shared hit=74842343 read=7242 dirtied=7513
-> Index Scan using "TRANSLATION_idx_composite_job_last_revision" on
"TRANSLATION" (cost=0.56..9645.13 rows=3113 width=391) (actual
time=0.042..24.147 rows=8920 loops=1)
Index Cond: (fk_job = 59004)
Buffers: shared hit=626

Planning time: 0.362 ms Execution time: 35091.192 ms
------------------------------
View this message in context: Re: The fastest way to update thousands of
rows in moderately sized table
<http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859197.html&gt;
Sent from the PostgreSQL - general mailing list archive
<http://postgresql.nabble.com/PostgreSQL-general-f1843780.html&gt; at
Nabble.com.

--
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!
【point+plus】http://www.repica.jp/pointplus/

▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/

▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/

▼単月導入可能!AR動画再生アプリ
【marcs】http://www.arappli.com/service/marcs/

▼ITビジネスを創造しながら未来を創る
【VARCHAR】http://varchar.co.jp/
───────────────────────────

#9twoflower
standa.kurik@gmail.com
In reply to: 林士博 (#8)
Re: The fastest way to update thousands of rows in moderately sized table

Thank you, I will look into those suggestions.

Meanwhile, I started experimenting with partitioning the table into smaller
tables, each holding rows with ID spanning 1 million values and using this
approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am
issuing against the original table keep their performance, but so far it
seems they do, if the appropriate indexes are present on the child tables. I
was worried about the overhead of each query having to go through all
(currently) 58 partition tables, but it seems like it's not that big of a
deal.

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10林士博
lin@repica.co.jp
In reply to: twoflower (#9)
Re: Re: The fastest way to update thousands of rows in moderately sized table

Yes, partitioning by fk_job can significantly improve performance of this
update.
And all the SELECTs with definited fk_job can be faster.

All you should check carefully is those SELECTs without definited fk_job.

2015-07-24 17:18 GMT+09:00 twoflower <standa.kurik@gmail.com>:

Thank you, I will look into those suggestions.

Meanwhile, I started experimenting with partitioning the table into smaller
tables, each holding rows with ID spanning 1 million values and using this
approach, the UPDATE takes 300ms. I have to check if all the SELECTs I am
issuing against the original table keep their performance, but so far it
seems they do, if the appropriate indexes are present on the child tables.
I
was worried about the overhead of each query having to go through all
(currently) 58 partition tables, but it seems like it's not that big of a
deal.

--
View this message in context:
http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859203.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
─repica group──────────────────
▼ポイント×電子マネー×メールで店舗販促に必要な機能を全て提供!
【point+plus】http://www.repica.jp/pointplus/

▼フォローアップメールや外部連携に対応!
【mail solution】http://ms.repica.jp/

▼9年連続シェアNo.1 個人情報漏えい対策ソフト
【P-Pointer】http://ppointer.jp/

▼単月導入可能!AR動画再生アプリ
【marcs】http://www.arappli.com/service/marcs/

▼ITビジネスを創造しながら未来を創る
【VARCHAR】http://varchar.co.jp/
───────────────────────────

#11twoflower
standa.kurik@gmail.com
In reply to: 林士博 (#10)
Re: The fastest way to update thousands of rows in moderately sized table

In fact I did not partition by *fk_job* but by the *id* (primary key) instead
thoughpartitioning by *fk_job* was my first idea.

I use various columns when querying the table, *fk_job* is not always there.

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144p5859205.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#12林士博
lin@repica.co.jp
In reply to: 林士博 (#10)
Re: Re: The fastest way to update thousands of rows in moderately sized table

Thanks for sharing the results of your experiment !