gin index creation performance problems
I'm looking for a bit more guidance on gin index creation.
The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commit
To have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed
2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5
Index creation takes more than 1h.
maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, 3 Nov 2008 16:45:35 +0100
Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
Forgot to add that top say postgresql is using 100% CPU and 15%
memory.
I'm looking for a bit more guidance on gin index creation.
The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commitTo have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5
Index creation takes more than 1h.
maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Mon, 3 Nov 2008, Ivan Sergio Borgonovo wrote:
I'm looking for a bit more guidance on gin index creation.
The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commitTo have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5
Index creation takes more than 1h.
maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?
why you didn't change maintenance_work_mem ? You can change it online just
before create index. Bulk gin index creation uses it as a buffer and you can
save a lot of IO.
All this written in the documentation and there are other parameters you
should be concerned about.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
maintenance_work_mem is still untouched. What would be a good value
to start from?
GIN index build time is *very* sensitive to maintenance_work_mem.
Try cranking it up to a couple hundred megabytes and see if that
helps.
Also, if you're on 8.2, I think 8.3 might be faster.
regards, tom lane
On Mon, 03 Nov 2008 11:04:45 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
maintenance_work_mem is still untouched. What would be a good
value to start from?
GIN index build time is *very* sensitive to maintenance_work_mem.
Try cranking it up to a couple hundred megabytes and see if that
helps.
200MB did improve the situation. I was waiting a clue about a
reasonable value from where to start.
But the performance is far from being stable.
It can take few minutes to more than 20min even with this setting.
I can't understand if it is CPU bound or RAM bound.
CPU load is always near 100% while postgresql is using 15% of RAM.
Still I've 52K of swap used...
But anyway the performance is very erratic.
BTW Is maintenance_work_mem set per connection?
While gin index looks appreciably faster (actually it is
lightening fast) for searches I'm considering to revert to gist since
even with 200MB maintenance_work_mem it still look a pain to build
up the index, especially considering the random time required to
build it up.
What puzzle me is that while before increasing maintenance_work
mem it was always terribly slow now there is a huge variation in
rebuilt time with 200MB.
Even
vacuum full;
can be pretty slow (>5min) and still 100% CPU use.
The index is dropped at the beginning and rebuilt at the end inside a
quite busy transaction.
Could it be related?
Also, if you're on 8.2, I think 8.3 might be faster.
8.3 etch backport.
thanks
--
Ivan Sergio Borgonovo
http://www.webthatworks.it