gin index creation performance problems

Started by Ivan Sergio Borgonovoover 17 years ago5 messagesgeneral

mail@webthatworks.it

over 17 years ago

I'm looking for a bit more guidance on gin index creation.

The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commit

To have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed

2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5

Index creation takes more than 1h.

maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Ivan Sergio Borgonovo

mail@webthatworks.it

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#1)

Re: gin index creation performance problems

On Mon, 3 Nov 2008 16:45:35 +0100
Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

Forgot to add that top say postgresql is using 100% CPU and 15%
memory.

I'm looking for a bit more guidance on gin index creation.

The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commit

To have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed

2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5

Index creation takes more than 1h.

maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Oleg Bartunov

oleg@sai.msu.su

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#1)

Re: gin index creation performance problems

On Mon, 3 Nov 2008, Ivan Sergio Borgonovo wrote:

I'm looking for a bit more guidance on gin index creation.

The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commit

To have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed

2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5

Index creation takes more than 1h.

maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?

why you didn't change maintenance_work_mem ? You can change it online just
before create index. Bulk gin index creation uses it as a buffer and you can
save a lot of IO.

All this written in the documentation and there are other parameters you
should be concerned about.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Ivan Sergio Borgonovo (#1)

Re: gin index creation performance problems

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

maintenance_work_mem is still untouched. What would be a good value
to start from?

GIN index build time is *very* sensitive to maintenance_work_mem.
Try cranking it up to a couple hundred megabytes and see if that
helps.

Also, if you're on 8.2, I think 8.3 might be faster.

regards, tom lane

Ivan Sergio Borgonovo

mail@webthatworks.it

over 17 years ago

In reply to: Tom Lane (#4)

Re: gin index creation performance problems

On Mon, 03 Nov 2008 11:04:45 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

maintenance_work_mem is still untouched. What would be a good
value to start from?

GIN index build time is *very* sensitive to maintenance_work_mem.
Try cranking it up to a couple hundred megabytes and see if that
helps.

200MB did improve the situation. I was waiting a clue about a
reasonable value from where to start.

But the performance is far from being stable.
It can take few minutes to more than 20min even with this setting.
I can't understand if it is CPU bound or RAM bound.
CPU load is always near 100% while postgresql is using 15% of RAM.
Still I've 52K of swap used...
But anyway the performance is very erratic.

BTW Is maintenance_work_mem set per connection?

While gin index looks appreciably faster (actually it is
lightening fast) for searches I'm considering to revert to gist since
even with 200MB maintenance_work_mem it still look a pain to build
up the index, especially considering the random time required to
build it up.

What puzzle me is that while before increasing maintenance_work
mem it was always terribly slow now there is a huge variation in
rebuilt time with 200MB.
Even
vacuum full;
can be pretty slow (>5min) and still 100% CPU use.

The index is dropped at the beginning and rebuilt at the end inside a
quite busy transaction.
Could it be related?

Also, if you're on 8.2, I think 8.3 might be faster.

8.3 etch backport.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it