issue with gininsert under very high load

Started by Andrew Dunstanalmost 12 years ago17 messages
#1Andrew Dunstan
andrew@dunslane.net

A client of ours encountered this problem when upgrading (via slony)
from 8.4 to 9.2, and then from 8.4 to to 9.3. The application is a
telephony app that inserts call records at pretty hight volume in the
busiest part of the day, which usually starts around 10.00 am to 11.00
am US eastern time. What happened was that the system load average shot
up from, say, 6 to way over 100, and the database became largely
unresponsive. This problem did not occur on 8.4. The platform is CentOS
6.4 / x86_64.

On investigation I found that a number of processes were locked waiting
for one wedged process to end its transaction, which never happened
(this transaction should normally take milliseconds). oprofile revealed
that postgres was spending 87% of its time in s_lock(), and strace on
the wedged process revealed that it was in a tight loop constantly
calling select(). It did not respond to a SIGTERM. I attached a debugger
to the wedged process and got this backtrace:

#0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6
#1 0x00000000006021a7 in PGSemaphoreLock ()
#2 0x00000000006463a1 in LWLockAcquire ()
#3 0x0000000000631997 in ReadBuffer_common ()
#4 0x00000000006322ee in ReadBufferExtended ()
#5 0x0000000000467475 in ginPrepareFindLeafPage ()
#6 0x000000000046663f in ginInsertItemPointers ()
#7 0x0000000000462adc in ginEntryInsert ()
#8 0x000000000046cad0 in ginInsertCleanup ()
#9 0x000000000046d7c6 in ginHeapTupleFastInsert ()
#10 0x0000000000462cfa in gininsert ()
#11 0x0000000000722b33 in FunctionCall6Coll ()
#12 0x000000000048bdff in index_insert ()
#13 0x0000000000587595 in ExecInsertIndexTuples ()
#14 0x00000000005948e1 in ExecModifyTable ()
#15 0x000000000057e718 in ExecProcNode ()
#16 0x000000000057d512 in standard_ExecutorRun ()
[...]

Armed with this the client identified a single gin index (on an array of
text) that could have been involved, and on removing the index the
problem simply went away, and they have now been live on 9.3 for 36
hours without a mishap.

The client is attempting to create a self-contained reproducible test,
but understandably are not going to re-add the index to their production
system.

I'm not terribly familiar with the gin code, so I'm reporting this in
the hope that other people who are more familiar with it than I am might
know where to look for a potential race condition or other bug that
might have caused this.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Andrew Dunstan (#1)
Re: issue with gininsert under very high load

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked waiting for
one wedged process to end its transaction, which never happened (this
transaction should normally take milliseconds). oprofile revealed that
postgres was spending 87% of its time in s_lock(), and strace on the wedged
process revealed that it was in a tight loop constantly calling select(). It
did not respond to a SIGTERM. I attached a debugger to the wedged process
and got this backtrace:

#0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6
#1 0x00000000006021a7 in PGSemaphoreLock ()
#2 0x00000000006463a1 in LWLockAcquire ()
#3 0x0000000000631997 in ReadBuffer_common ()
#4 0x00000000006322ee in ReadBufferExtended ()
#5 0x0000000000467475 in ginPrepareFindLeafPage ()
#6 0x000000000046663f in ginInsertItemPointers ()
#7 0x0000000000462adc in ginEntryInsert ()
#8 0x000000000046cad0 in ginInsertCleanup ()
#9 0x000000000046d7c6 in ginHeapTupleFastInsert ()
#10 0x0000000000462cfa in gininsert ()
#11 0x0000000000722b33 in FunctionCall6Coll ()
#12 0x000000000048bdff in index_insert ()
#13 0x0000000000587595 in ExecInsertIndexTuples ()
#14 0x00000000005948e1 in ExecModifyTable ()
#15 0x000000000057e718 in ExecProcNode ()
#16 0x000000000057d512 in standard_ExecutorRun ()
[...]

Armed with this the client identified a single gin index (on an array of
text) that could have been involved, and on removing the index the problem
simply went away, and they have now been live on 9.3 for 36 hours without a
mishap.

The client is attempting to create a self-contained reproducible test, but
understandably are not going to re-add the index to their production system.

I'm not terribly familiar with the gin code, so I'm reporting this in the
hope that other people who are more familiar with it than I am might know
where to look for a potential race condition or other bug that might have
caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#2)
Re: issue with gininsert under very high load

* Andres Freund (andres@2ndquadrant.com) wrote:

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.

As I recall, there was argument that we didn't really need a new GUC for
this (which was the proposal) but rather just need to pick a reasonable
(small) value and hard-code it. Are there objections to doing so, or
are there cases where that would be a serious problem? How do people
feel about 4MB?

Thanks,

Stephen

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: issue with gininsert under very high load

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked waiting for
one wedged process to end its transaction, which never happened (this
transaction should normally take milliseconds). oprofile revealed that
postgres was spending 87% of its time in s_lock(), and strace on the wedged
process revealed that it was in a tight loop constantly calling select(). It
did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem. It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#2)
Re: issue with gininsert under very high load

On 02/12/2014 02:50 PM, Andres Freund wrote:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked waiting for
one wedged process to end its transaction, which never happened (this
transaction should normally take milliseconds). oprofile revealed that
postgres was spending 87% of its time in s_lock(), and strace on the wedged
process revealed that it was in a tight loop constantly calling select(). It
did not respond to a SIGTERM. I attached a debugger to the wedged process
and got this backtrace:

#0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6
#1 0x00000000006021a7 in PGSemaphoreLock ()
#2 0x00000000006463a1 in LWLockAcquire ()
#3 0x0000000000631997 in ReadBuffer_common ()
#4 0x00000000006322ee in ReadBufferExtended ()
#5 0x0000000000467475 in ginPrepareFindLeafPage ()
#6 0x000000000046663f in ginInsertItemPointers ()
#7 0x0000000000462adc in ginEntryInsert ()
#8 0x000000000046cad0 in ginInsertCleanup ()
#9 0x000000000046d7c6 in ginHeapTupleFastInsert ()
#10 0x0000000000462cfa in gininsert ()
#11 0x0000000000722b33 in FunctionCall6Coll ()
#12 0x000000000048bdff in index_insert ()
#13 0x0000000000587595 in ExecInsertIndexTuples ()
#14 0x00000000005948e1 in ExecModifyTable ()
#15 0x000000000057e718 in ExecProcNode ()
#16 0x000000000057d512 in standard_ExecutorRun ()
[...]

Armed with this the client identified a single gin index (on an array of
text) that could have been involved, and on removing the index the problem
simply went away, and they have now been live on 9.3 for 36 hours without a
mishap.

The client is attempting to create a self-contained reproducible test, but
understandably are not going to re-add the index to their production system.

I'm not terribly familiar with the gin code, so I'm reporting this in the
hope that other people who are more familiar with it than I am might know
where to look for a potential race condition or other bug that might have
caused this.

That's a deficiency of the gin fastupdate cache: a) it bases it's size
on work_mem which usually makes it *far* too big b) it doesn't perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

The fix is usually to just turn FASTUPDATE off for all indexes. If it's
done after the initial creation, the table should be vacuumed afterwards
to flush it.

There's previously been talk about changing the limits to something more
reasonable but it got stalled in bikeshedding IIRC.

So this doesn't work just when it might be most useful?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: issue with gininsert under very high load

On February 12, 2014 9:33:38 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem. It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow.

Andres
--
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#6)
Re: issue with gininsert under very high load

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem. It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow.

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#7)
Re: issue with gininsert under very high load

On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem. It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't
actually stuck, just livelocked. I'm just on my mobile right now, but
afair Andrew described a loop involving lots of semaphores and
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a
large amount of items one by one, it's not surprising if its
dramatically slow.

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

Is that going to serialize all these inserts?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andrew Dunstan (#8)
Re: issue with gininsert under very high load

On 02/13/2014 05:40 PM, Andrew Dunstan wrote:

On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem. It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't
actually stuck, just livelocked. I'm just on my mobile right now, but
afair Andrew described a loop involving lots of semaphores and
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a
large amount of items one by one, it's not surprising if its
dramatically slow.

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

Is that going to serialize all these inserts?

It will serialize the cleanup process, which moves entries from the
pending list to the tree proper. But that's better than the current
situation. Currently, when two processes attempt it, they will both try
to insert into the GIN tree, but one of them will notice that the other
one already did the cleanup, and bail out. So only one process
contributes to progress, while the others just waste their effort.

The processes should try to get the lock, and just give up if it's
already held rather than wait. If someone else is already doing the
cleanup, there's no need for the current process to do it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#9)
Re: issue with gininsert under very high load

On 02/13/2014 10:58 AM, Heikki Linnakangas wrote:

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

Is that going to serialize all these inserts?

It will serialize the cleanup process, which moves entries from the
pending list to the tree proper. But that's better than the current
situation. Currently, when two processes attempt it, they will both
try to insert into the GIN tree, but one of them will notice that the
other one already did the cleanup, and bail out. So only one process
contributes to progress, while the others just waste their effort.

The processes should try to get the lock, and just give up if it's
already held rather than wait. If someone else is already doing the
cleanup, there's no need for the current process to do it.

Sounds good.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#9)
1 attachment(s)
Re: issue with gininsert under very high load

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

(ginInsertCleanup's header comment also needs to be rewritten, but for
testing purposes, this is fine.)

regards, tom lane

Attachments:

no-parallel-gin-cleanup.patchtext/x-diff; charset=us-ascii; name=no-parallel-gin-cleanup.patchDownload
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 4a65046..38cda14 100644
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 21,26 ****
--- 21,27 ----
  #include "access/gin_private.h"
  #include "commands/vacuum.h"
  #include "miscadmin.h"
+ #include "storage/lmgr.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
*************** ginInsertCleanup(GinState *ginstate,
*** 739,744 ****
--- 740,755 ----
  	KeyArray	datums;
  	BlockNumber blkno;
  
+ 	/*
+ 	 * We use a heavyweight lock on the metapage to ensure that only one
+ 	 * backend at a time tries to clean up the pending list.  While it does
+ 	 * actually work for multiple backends to run this code concurrently, that
+ 	 * turns out to be a bad idea because there's lots of locking conflicts.
+ 	 * So if someone else is already running cleanup, we just do nothing.
+ 	 */
+ 	if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
+ 		return;
+ 
  	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
  	LockBuffer(metabuffer, GIN_SHARE);
  	metapage = BufferGetPage(metabuffer);
*************** ginInsertCleanup(GinState *ginstate,
*** 748,753 ****
--- 759,765 ----
  	{
  		/* Nothing to do */
  		UnlockReleaseBuffer(metabuffer);
+ 		UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
  		return;
  	}
  
*************** ginInsertCleanup(GinState *ginstate,
*** 925,930 ****
--- 937,944 ----
  
  	ReleaseBuffer(metabuffer);
  
+ 	UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+ 
  	/* Clean up temporary space */
  	MemoryContextSwitchTo(oldCtx);
  	MemoryContextDelete(opCtx);
#12Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: issue with gininsert under very high load

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

(ginInsertCleanup's header comment also needs to be rewritten, but for
testing purposes, this is fine.)

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: issue with gininsert under very high load

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#13)
Re: issue with gininsert under very high load

On 2014-02-13 18:49:21 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

Well, appending to the fast tuple list will normally be fully cached io,
cleaning it up not so much.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Jesper Krogh
jesper@krogh.cc
In reply to: Tom Lane (#13)
Re: issue with gininsert under very high load

On 14/02/14 00:49, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.

Having some real-world experience with the fastupdate mechanism. Under
concurrent load
it behaves really bad. Random processes waiting for cleanup (or
competing with cleanup) is
going to see latency-spikes, because they magically hit that corner,
thus reverting to plain
inserts if it cannot add to the pending list, will not remove this
problem, but will
make it only hit the process actually doing the cleanup.

The build in mechanism, that cleanup is i cost paid by the process who
happened to
fill the pendinglist, is really hard to deal with in production. More
control is appreciated,
perhaps even an explicit flush-mechanism.. I'd like to batch up
inserts during one transaction only
and flush on commit.

--
Jesper - with fastupdate turned off due to above issues.

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Jesper Krogh (#15)
Re: issue with gininsert under very high load

On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:

On 14/02/14 00:49, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached? Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.

Having some real-world experience with the fastupdate mechanism. Under
concurrent load
it behaves really bad. Random processes waiting for cleanup (or competing
with cleanup) is
going to see latency-spikes, because they magically hit that corner, thus
reverting to plain
inserts if it cannot add to the pending list, will not remove this problem,
but will
make it only hit the process actually doing the cleanup.

Yea, this is only a part of fixing fastupdate. Limiting the size of the
fastupdate list to something more reasonable is pretty important as
well. Not competing around cleanup will make cleanup much faster though,
so I am not that concerned about the latency spikes it causes once it's
limited in size and done non-concurrently.

The build in mechanism, that cleanup is i cost paid by the process who
happened to fill the pendinglist, is really hard to deal with in
production. More control is appreciated, perhaps even an explicit
flush-mechanism.. I'd like to batch up inserts during one transaction
only and flush on commit.

That doesn't seem likely to work with a reasonable amount of effort. The
fastupdate list is shared across all processes, so one backend will
always pay the price for several others.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#16)
Re: issue with gininsert under very high load

Andres Freund wrote:

On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:

The build in mechanism, that cleanup is i cost paid by the process who
happened to fill the pendinglist, is really hard to deal with in
production. More control is appreciated, perhaps even an explicit
flush-mechanism.. I'd like to batch up inserts during one transaction
only and flush on commit.

That doesn't seem likely to work with a reasonable amount of effort. The
fastupdate list is shared across all processes, so one backend will
always pay the price for several others.

Unless some other process does it, such as autovacuum.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers