[sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Started by Andreas Seltenreichalmost 9 years ago14 messages
#1Andreas Seltenreich
seltenreich@gmx.de

Hi,

testing with master as of cf366e97ff, sqlsmith occasionally triggers the
following assertion:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

Backtraces always look like the one below. It is reproducible on a
cluster once it happens. I could provide a tarball if needed.

regards,
Andreas

#2 0x00000000008324b1 in ExceptionalCondition (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", errorType=errorType@entry=0x87b03d "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", lineNumber=lineNumber@entry=3397) at assert.c:54
#3 0x0000000000706971 in MarkBufferDirtyHint (buffer=2844, buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
#4 0x00000000004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at hashutil.c:514
#5 0x00000000004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
#6 0x00000000004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
#7 0x000000000061fa51 in ExecEndIndexScan (node=0x3093f30) at nodeIndexscan.c:852
#8 0x0000000000608e59 in ExecEndNode (node=<optimized out>) at execProcnode.c:715
#9 0x00000000006045b8 in ExecEndPlan (estate=0x3064000, planstate=<optimized out>) at execMain.c:1540
#10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
#11 0x00000000005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
#12 0x000000000085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=<optimized out>) at portalmem.c:489
#13 0x0000000000736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at postgres.c:1111
#14 0x0000000000738b51 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1a6c6c8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4071
#15 0x0000000000475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
#16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
#17 ServerLoop () at postmaster.c:1729
#18 0x00000000006c8662 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x1a3f540) at postmaster.c:1337
#19 0x000000000047729d in main (argc=4, argv=0x1a3f540) at main.c:228

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Andreas Seltenreich (#1)
1 attachment(s)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Hi,

testing with master as of cf366e97ff, sqlsmith occasionally triggers the
following assertion:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

Backtraces always look like the one below. It is reproducible on a
cluster once it happens. I could provide a tarball if needed.

regards,
Andreas

#2 0x00000000008324b1 in ExceptionalCondition (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", errorType=errorType@entry=0x87b03d "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", lineNumber=lineNumber@entry=3397) at assert.c:54
#3 0x0000000000706971 in MarkBufferDirtyHint (buffer=2844, buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
#4 0x00000000004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at hashutil.c:514
#5 0x00000000004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
#6 0x00000000004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
#7 0x000000000061fa51 in ExecEndIndexScan (node=0x3093f30) at nodeIndexscan.c:852
#8 0x0000000000608e59 in ExecEndNode (node=<optimized out>) at execProcnode.c:715
#9 0x00000000006045b8 in ExecEndPlan (estate=0x3064000, planstate=<optimized out>) at execMain.c:1540
#10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
#11 0x00000000005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
#12 0x000000000085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=<optimized out>) at portalmem.c:489
#13 0x0000000000736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at postgres.c:1111
#14 0x0000000000738b51 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1a6c6c8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4071
#15 0x0000000000475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
#16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
#17 ServerLoop () at postmaster.c:1729
#18 0x00000000006c8662 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x1a3f540) at postmaster.c:1337
#19 0x000000000047729d in main (argc=4, argv=0x1a3f540) at main.c:228

Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1]/messages/by-id/CA+TgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA+g@mail.gmail.com.

[1]: /messages/by-id/CA+TgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA+g@mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

reacquire_lock_hashkillitems_if_required.patchapplication/x-patch; name=reacquire_lock_hashkillitems_if_required.patchDownload
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..0bacef8 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -478,7 +478,7 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
@@ -509,7 +509,7 @@ hashendscan(IndexScanDesc scan)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 2d92049..414cc6a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -467,7 +467,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 					/* Before leaving current page, deal with any killed items */
 					if (so->numKilled > 0)
-						_hash_kill_items(scan);
+						_hash_kill_items(scan, true);
 
 					/*
 					 * ran off the end of this page, try the next
@@ -524,7 +524,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 					/* Before leaving current page, deal with any killed items */
 					if (so->numKilled > 0)
-						_hash_kill_items(scan);
+						_hash_kill_items(scan, true);
 
 					/*
 					 * ran off the end of this page, try the next
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 2e99719..f24bc4c 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -456,11 +456,15 @@ _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket,
  * current page and killed tuples thereon (generally, this should only be
  * called if so->numKilled > 0).
  *
+ * The caller must have pin on so->hashso_curbuf, but may or may not have
+ * read-lock, as indicated by haveLock.  Note that we assume read-lock
+ * is sufficient for setting LP_DEAD hint bits.
+ *
  * We match items by heap TID before assuming they are the right ones to
  * delete.
  */
 void
-_hash_kill_items(IndexScanDesc scan)
+_hash_kill_items(IndexScanDesc scan, bool haveLock)
 {
 	HashScanOpaque	so = (HashScanOpaque) scan->opaque;
 	Page	page;
@@ -472,6 +476,10 @@ _hash_kill_items(IndexScanDesc scan)
 
 	Assert(so->numKilled > 0);
 	Assert(so->killedItems != NULL);
+	Assert(BufferIsValid(so->hashso_curbuf));
+
+	if (!haveLock)
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 
 	/*
 	 * Always reset the scan state, so we don't look for same
@@ -513,4 +521,7 @@ _hash_kill_items(IndexScanDesc scan)
 		opaque->hasho_flag |= LH_PAGE_HAS_DEAD_TUPLES;
 		MarkBufferDirtyHint(so->hashso_curbuf, true);
 	}
+
+	if (!haveLock)
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
 }
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index eb1df57..89fc319 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -393,7 +393,7 @@ extern BlockNumber _hash_get_oldblock_from_newbucket(Relation rel, Bucket new_bu
 extern BlockNumber _hash_get_newblock_from_oldbucket(Relation rel, Bucket old_bucket);
 extern Bucket _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket,
 								   uint32 lowmask, uint32 maxbucket);
-extern void _hash_kill_items(IndexScanDesc scan);
+extern void _hash_kill_items(IndexScanDesc scan, bool haveLock);
 
 /* hash.c */
 extern void hashbucketcleanup(Relation rel, Bucket cur_bucket,
#3Andreas Seltenreich
seltenreich@gmx.de
In reply to: Ashutosh Sharma (#2)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Ashutosh Sharma writes:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each. Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

[2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Andreas Seltenreich (#3)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Mar 28, 2017 00:00, "Andreas Seltenreich" <seltenreich@gmx.de> wrote:

Ashutosh Sharma writes:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)

(&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each. Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

Okay. Thanks for sharing this information. I tried running running sqlsmith
on a single node. I ran it for an hour but didn't find any crash. Let me
also try running multiple sqlsmith at a time...May be 5 like you are doing
on a single node.

[2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

Sure, Thanks a lot.

With Regards,
Ashutosh Sharma

#5Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#2)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

Instead of adding bool haveLock to _hash_kill_items, how about just
having the caller acquire the lock before calling the function and
release it afterwards? Since the acquire is at the beginning of the
function and the release at the end, there seems to be no point in
putting the acquire/release inside the function rather than in the
caller.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#5)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Sat, Apr 1, 2017 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

Instead of adding bool haveLock to _hash_kill_items, how about just
having the caller acquire the lock before calling the function and
release it afterwards? Since the acquire is at the beginning of the
function and the release at the end, there seems to be no point in
putting the acquire/release inside the function rather than in the
caller.

Well, That is another option but the main idea was to be inline with
the btree code. Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#6)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Well, That is another option but the main idea was to be inline with
the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better. I think it's fine, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#7)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Well, That is another option but the main idea was to be inline with
the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

It was there but later got removed with some patch (may be the patch
for reducing pinning and buffer content lock for btree scans). The
following commit has this changes.

commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun May 7 01:21:30 2006 +0000

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

That is just to follow the naming convention in hash.h Please check
the review comments for this at - [1]/messages/by-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522@redhat.com.

Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better.

okay, agreed. I will submit the patch very shortly.

[1]: /messages/by-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522@redhat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#8)
1 attachment(s)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Hi,

On Sat, Apr 1, 2017 at 7:06 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Well, That is another option but the main idea was to be inline with
the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

It was there but later got removed with some patch (may be the patch
for reducing pinning and buffer content lock for btree scans). The
following commit has this changes.

commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun May 7 01:21:30 2006 +0000

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

That is just to follow the naming convention in hash.h Please check
the review comments for this at - [1].

Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better.

okay, agreed. I will submit the patch very shortly.

As suggested, I am now acquiring lock inside the caller function.
Attached is the patch having this changes. Thanks.

[1] - /messages/by-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522@redhat.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

acquire_lock_when_calling_hash_kill_items.patchtext/x-patch; charset=US-ASCII; name=acquire_lock_when_calling_hash_kill_items.patchDownload
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..b835f77 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -476,9 +476,17 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 
@@ -507,9 +515,17 @@ hashendscan(IndexScanDesc scan)
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 
#10Andreas Seltenreich
seltenreich@gmx.de
In reply to: Andreas Seltenreich (#3)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Andreas Seltenreich writes:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.

Ashutosh Sharma writes:

[2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Andreas Seltenreich (#10)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

Hi Andreas,

On Apr 1, 2017 16:15, "Andreas Seltenreich" <seltenreich@gmx.de> wrote:

Andreas Seltenreich writes:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)

(&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.

Ashutosh Sharma writes:

[2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.

Thanks for testing my patch. Just FYI, I have slightly changed my patch
this morning as per Robert's suggestions. But, still more or less the logic
remains the same. Thank you once again.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#7)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Well, That is another option but the main idea was to be inline with
the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better. I think it's fine, though.

One thing to note here is that this fix won't be required if we get
the page-scan-mode patch [1]https://commitfest.postgresql.org/13/999/ in this CF. I think if we fix this with
the patch proposed by Ashutosh, then we anyway needs to again change
the related code (kind of revert the fix) after page-scan-mode patch.
Now, if you think we have negligible chance of getting that patch,
then it is good to proceed with this fix.

[1]: https://commitfest.postgresql.org/13/999/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#12)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Apr 1, 2017 18:11, "Amit Kapila" <amit.kapila16@gmail.com> wrote:

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com>

wrote:

Well, That is another option but the main idea was to be inline with
the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better. I think it's fine, though.

One thing to note here is that this fix won't be required if we get
the page-scan-mode patch [1]https://commitfest.postgresql.org/13/999/ in this CF. I think if we fix this with
the patch proposed by Ashutosh, then we anyway needs to again change
the related code (kind of revert the fix) after page-scan-mode patch.
Now, if you think we have negligible chance of getting that patch,
then it is good to proceed with this fix.

Yes, I had already mentioned this point in my very first mail. Thanks for
highlighting this once again.

[1]: https://commitfest.postgresql.org/13/999/

#14Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#9)
Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

On Fri, Mar 31, 2017 at 10:02 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

As suggested, I am now acquiring lock inside the caller function.
Attached is the patch having this changes. Thanks.

Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers