Possibly too stringent Assert() in b-tree code
I've recently seen this when using 9.6:
#0 0x00007f147892f0c7 in raise () from /lib64/libc.so.6
#1 0x00007f1478930478 in abort () from /lib64/libc.so.6
#2 0x00000000009683a1 in ExceptionalCondition (conditionName=0x9f2ef8 "!(((PageHeader) (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9f2e8a "FailedAssertion",
fileName=0x9f2e60 "../../../../src/include/storage/bufpage.h", lineNumber=314) at assert.c:54
#3 0x00000000004ee3d2 in PageValidateSpecialPointer (page=0x7f14700f3480 "") at ../../../../src/include/storage/bufpage.h:314
#4 0x00000000004ef913 in _bt_getbuf (rel=0x7f14795a1a50, blkno=11, access=2) at nbtpage.c:629
#5 0x00000000004eae8c in _bt_split (rel=0x7f14795a1a50, buf=136, cbuf=0, firstright=367, newitemoff=408, newitemsz=16, newitem=0x2608330, newitemonleft=0 '\000') at nbtinsert.c:986
#6 0x00000000004ea563 in _bt_insertonpg (rel=0x7f14795a1a50, buf=136, cbuf=0, stack=0x26090c8, itup=0x2608330, newitemoff=408, split_only_page=0 '\000') at nbtinsert.c:781
#7 0x00000000004e954c in _bt_doinsert (rel=0x7f14795a1a50, itup=0x2608330, checkUnique=UNIQUE_CHECK_YES, heapRel=0x25aa0a0) at nbtinsert.c:204
#8 0x00000000004f3b73 in btinsert (rel=0x7f14795a1a50, values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", ht_ctid=0x260927c, heapRel=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES) at nbtree.c:279
#9 0x00000000004e7964 in index_insert (indexRelation=0x7f14795a1a50, values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", heap_t_ctid=0x260927c, heapRelation=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES)
at indexam.c:204
Of course, the database could have been corrupted after having encountered
many crashes during my experiments. Neverthelesss, even without in-depth
knowledge of the b-tree code I suspect that this stack trace might reflect a
legal situation. In partcular, if _bt_page_recyclable() returned on this
condition:
if (PageIsNew(page))
return true;
Unfortunately I no longer have the cluster nor the core dump, but I remember
all the fields of PageHeader were indeed zeroed.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 19, 2016 at 1:32 PM, Antonin Houska <ah@cybertec.at> wrote:
I've recently seen this when using 9.6:
#0 0x00007f147892f0c7 in raise () from /lib64/libc.so.6
#1 0x00007f1478930478 in abort () from /lib64/libc.so.6
#2 0x00000000009683a1 in ExceptionalCondition (conditionName=0x9f2ef8 "!(((PageHeader) (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9f2e8a "FailedAssertion",
fileName=0x9f2e60 "../../../../src/include/storage/bufpage.h", lineNumber=314) at assert.c:54
#3 0x00000000004ee3d2 in PageValidateSpecialPointer (page=0x7f14700f3480 "") at ../../../../src/include/storage/bufpage.h:314
#4 0x00000000004ef913 in _bt_getbuf (rel=0x7f14795a1a50, blkno=11, access=2) at nbtpage.c:629
#5 0x00000000004eae8c in _bt_split (rel=0x7f14795a1a50, buf=136, cbuf=0, firstright=367, newitemoff=408, newitemsz=16, newitem=0x2608330, newitemonleft=0 '\000') at nbtinsert.c:986
#6 0x00000000004ea563 in _bt_insertonpg (rel=0x7f14795a1a50, buf=136, cbuf=0, stack=0x26090c8, itup=0x2608330, newitemoff=408, split_only_page=0 '\000') at nbtinsert.c:781
#7 0x00000000004e954c in _bt_doinsert (rel=0x7f14795a1a50, itup=0x2608330, checkUnique=UNIQUE_CHECK_YES, heapRel=0x25aa0a0) at nbtinsert.c:204
#8 0x00000000004f3b73 in btinsert (rel=0x7f14795a1a50, values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", ht_ctid=0x260927c, heapRel=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES) at nbtree.c:279
#9 0x00000000004e7964 in index_insert (indexRelation=0x7f14795a1a50, values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", heap_t_ctid=0x260927c, heapRelation=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES)
at indexam.c:204Of course, the database could have been corrupted after having encountered
many crashes during my experiments. Neverthelesss, even without in-depth
knowledge of the b-tree code I suspect that this stack trace might reflect a
legal situation. In partcular, if _bt_page_recyclable() returned on this
condition:if (PageIsNew(page))
return true;
I think you have a valid point. It seems we don't need to write WAL
for reuse page (aka don't call _bt_log_reuse_page()), if the page is
new, as the only purpose of that log is to handle conflict based on
transaction id stored in special area which will be anyway zero.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Of course, the database could have been corrupted after having encountered
many crashes during my experiments. Neverthelesss, even without in-depth
knowledge of the b-tree code I suspect that this stack trace might reflect a
legal situation. In partcular, if _bt_page_recyclable() returned on this
condition:if (PageIsNew(page))
return true;I think you have a valid point. It seems we don't need to write WAL
for reuse page (aka don't call _bt_log_reuse_page()), if the page is
new, as the only purpose of that log is to handle conflict based on
transaction id stored in special area which will be anyway zero.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think you have a valid point. It seems we don't need to write WAL
for reuse page (aka don't call _bt_log_reuse_page()), if the page is
new, as the only purpose of that log is to handle conflict based on
transaction id stored in special area which will be anyway zero.
+1.
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. We could prevent the crash by
doing nothing if PageIsNew, a la
if (_bt_page_recyclable(page))
{
/*
* If we are generating WAL for Hot Standby then create a
* WAL record that will allow us to conflict with queries
* running on standby.
*/
- if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+ if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) &&
+ !PageIsNew(page))
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
_bt_log_reuse_page(rel, blkno, opaque->btpo.xact);
}
/* Okay to use page. Re-initialize and return it */
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think you have a valid point. It seems we don't need to write WAL
for reuse page (aka don't call _bt_log_reuse_page()), if the page is
new, as the only purpose of that log is to handle conflict based on
transaction id stored in special area which will be anyway zero.+1.
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. We could prevent the crash by
doing nothing if PageIsNew, a laif (_bt_page_recyclable(page)) { /* * If we are generating WAL for Hot Standby then create a * WAL record that will allow us to conflict with queries * running on standby. */ - if (XLogStandbyInfoActive() && RelationNeedsWAL(rel)) + if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) && + !PageIsNew(page)) { BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);_bt_log_reuse_page(rel, blkno, opaque->btpo.xact);
}/* Okay to use page. Re-initialize and return it */
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?
I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it. If this
understanding is correct, then I think the fix you are proposing is
correct.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. ...
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?
I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it. If this
understanding is correct, then I think the fix you are proposing is
correct.
After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle. Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?
In other words, if I'm left to fix it, I will do so by reverting
fafa374f2 lock stock and barrel. I think it was ill-considered and
unnecessary.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. ...
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it. If this
understanding is correct, then I think the fix you are proposing is
correct.After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle. Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?
It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby after
page deletion and the same still precedes the value of xact on page,
such transactions must be cancelled before we can reuse the page. I
think the fact that before recycling the page we do ensure that no
transaction is interested in that page in master indicates something
similar is required in standby as well.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. ...
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it. If this
understanding is correct, then I think the fix you are proposing is
correct.After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle. Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby after
Here by new transaction, I intend to say some newer snapshot with
valid MyPgXact->xmin.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello. I came from the following mail.
/messages/by-id/0F97FA9ABBDBE54F91744A9B37151A5116E4DD@g01jpexmbkw24
At Mon, 26 Sep 2016 09:12:04 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1K5YyDmndko0zzW6WNCN_DGFVHa6DCYcyuvkBWTH5+nUQ@mail.gmail.com>
On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area. ...
but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it. If this
understanding is correct, then I think the fix you are proposing is
correct.After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle. Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby afterHere by new transaction, I intend to say some newer snapshot with
valid MyPgXact->xmin.
I agree to the diagnosis. So the WAL record is not necessary if
it is a new page since no one cannot be grabbing it.
_bt_page_recyclable() has two callers. _bt_getbuf and
btvacuumpage. A comment in btvacuumpage is saying that.
* _bt_checkpage(), which will barf on an all-zero page. We want to
* recycle all-zero pages, not fail. Also, we want to use a nondefault
So it is right thing for _bt_page_recyclable() to return true for
zeroed pages and it is consistent with the comment in
_bt_page_recyclable().
At the problematic point, a new page is safe to recycle but
there's no need to issue WAL record since it is not recycled with
the meaning that _bt_log_resuse_page() thinking.
_bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
{
xl_btree_reuse_page xlrec_reuse;/*
* Note that we don't register the buffer with the record, because this
* operation doesn't modify the page. This record only exists to provide a
* conflict point for Hot Standby.
*/
I think we have all requred testimony from exiting code and
comments. FWIW my conclustion is that Tom's solution upthread is
right thing to do and needs addition in comment.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
fix_bt_getbuf_newpage_crash.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index a24e64156a..b39634cafd 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -782,9 +782,12 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
/*
* If we are generating WAL for Hot Standby then create a
* WAL record that will allow us to conflict with queries
- * running on standby.
+ * running on standby. No need to do that if the page is
+ * all-zoeroed since no one cannot be interested in the
+ * page on standbys. See _bt_page_recyclable().
*/
- if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+ if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) &&
+ !PageIsNew())
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
At Mon, 26 Sep 2016 09:12:04 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1K5YyDmndko0zzW6WNCN_DGFVHa6DCYcyuvkBWTH5+nUQ@mail.gmail.com>
It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby afterHere by new transaction, I intend to say some newer snapshot with
valid MyPgXact->xmin.
I agree to the diagnosis. So the WAL record is not necessary if
it is a new page since no one cannot be grabbing it.
Thanks for reviving this thread and reviewing the problem.
I pushed the patch now with some more work on the comments.
regards, tom lane