Slow GIN indexes after bulk insert

Started by Chris Spencerabout 10 years ago5 messagesgeneral

chrisspen@gmail.com

about 10 years ago

I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
Each index covers a "group" of common records like billing address,
shipping address, contact names, etc.

When first created, the indexes works miracles in speeding up the full text
search of these fields. However, I'm running into some serious maintenance
headaches.

After I insert a few thousand new records, the indexes seem to have no
effect. A query that might normally take 1 second now takes 5 minutes.

If I drop and recreate the indexes, then performance returns to normal, but
this causes a couple hours of downtime, even if I recreate indexes
concurrently.

Why are these GIN indexes becoming unusable after a large number of
inserts? Is this a known limitation or is there anything I can do to fix
this?

Chris

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Chris Spencer (#1)

Re: Slow GIN indexes after bulk insert

Chris Spencer <chrisspen@gmail.com> writes:

I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
Each index covers a "group" of common records like billing address,
shipping address, contact names, etc.

When first created, the indexes works miracles in speeding up the full text
search of these fields. However, I'm running into some serious maintenance
headaches.

After I insert a few thousand new records, the indexes seem to have no
effect. A query that might normally take 1 second now takes 5 minutes.

Probably the problem is that most of the new index entries are still
sitting in the index's pending list and haven't been merged to the main
search tree. Try VACUUM'ing the table after a bulk insert. (I think
there are other ways of triggering a GIN pending-list cleanup, but
I don't recall which ones were available in 9.3.)

Worthy of note here is that the max pending list size is governed by
work_mem, so a large work_mem can make this effect worse. (We got
smart and made it an independent setting, but not till 9.5.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Chris Spencer

chrisspen@gmail.com

about 10 years ago

In reply to: Tom Lane (#2)

Re: Slow GIN indexes after bulk insert

Thanks for the tips.

What constitutes a "large" work_mem? My server has 61GB of memory and my
work_mem is currently set to include all of that. What percent of my total
memory should I lower that to so it won't impact GIN updates?

On Mon, Mar 21, 2016 at 2:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chris Spencer <chrisspen@gmail.com> writes:

I have 5 GIN indexes on a PG 9.3 table containing about 50 million

records.

Each index covers a "group" of common records like billing address,
shipping address, contact names, etc.

When first created, the indexes works miracles in speeding up the full

text

search of these fields. However, I'm running into some serious

maintenance

Show quoted text

headaches.

After I insert a few thousand new records, the indexes seem to have no
effect. A query that might normally take 1 second now takes 5 minutes.

Probably the problem is that most of the new index entries are still
sitting in the index's pending list and haven't been merged to the main
search tree. Try VACUUM'ing the table after a bulk insert. (I think
there are other ways of triggering a GIN pending-list cleanup, but
I don't recall which ones were available in 9.3.)

Worthy of note here is that the max pending list size is governed by
work_mem, so a large work_mem can make this effect worse. (We got
smart and made it an independent setting, but not till 9.5.)

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Chris Spencer (#3)

Re: Slow GIN indexes after bulk insert

Chris Spencer <chrisspen@gmail.com> writes:

What constitutes a "large" work_mem? My server has 61GB of memory and my
work_mem is currently set to include all of that.

Ouch. That's a mistake independently of GIN. The primary usage of
work_mem is to define how much memory an individual sorting or hashing
query step is allowed to use. A complex query might have several sort or
hash steps, and then you need to worry about concurrent queries in
different sessions; not to mention that this is not the only demand on
your server's RAM. I'd be hesitant to set work_mem much above 1GB, maybe
even quite a bit less than that depending on what your workload is like.

Cutting work_mem to ~100MB might alone be enough to fix your GIN issue;
if not you could experiment with forced flushes of the GIN pending lists
via VACUUM (or ANALYZE might do it too, and be more directly useful).

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

about 10 years ago

In reply to: Chris Spencer (#1)

Re: Slow GIN indexes after bulk insert

On Mon, Mar 21, 2016 at 10:55 AM, Chris Spencer <chrisspen@gmail.com> wrote:

I have 5 GIN indexes on a PG 9.3 table containing about 50 million records.
Each index covers a "group" of common records like billing address, shipping
address, contact names, etc.

When first created, the indexes works miracles in speeding up the full text
search of these fields. However, I'm running into some serious maintenance
headaches.

After I insert a few thousand new records, the indexes seem to have no
effect. A query that might normally take 1 second now takes 5 minutes.

I'd try turning fastupdate off on the indexes. It will probably slow
down your bulk inserts, (perhaps by a lot, but you won't know until
you try). But if you can handle that, it will make your slow select
problems go away like magic.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general