hash index on unlogged tables doesn't behave as expected
While discussing the behavior of hash indexes with Bruce in the nearby
thread [1]/messages/by-id/20170630005634.GA4448@momjian.us, it has been noticed that hash index on unlogged tables
doesn't behave as expected. Prior to 10, it has the different set of
problems (mainly because hash indexes are not WAL-logged) which were
discussed on that thread [1]/messages/by-id/20170630005634.GA4448@momjian.us, however when I checked, it doesn't work
even for 10. Below are steps to reproduce the problem.
1. Setup master and standby
2. On the master, create unlogged table and hash index.
2A. Create unlogged table t1(c1 int);
2B. Create hash index idx_t1_hash on t1 using hash(c1);
3. On Standby, try selecting data,
select * from t1;
ERROR: cannot access temporary or unlogged relations during recovery
---Till here everything works as expected
4. Promote standby to master (I have just stopped the standby and
master and removed recovery.conf file from the standby database
location) and try starting the new master, it
gives below error and doesn't get started.
FATAL: could not create file "base/12700/16387": File exists
The basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged. Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork. This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.
Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty). However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.
I think this should be considered a PostgreSQL 10 open item.
[1]: /messages/by-id/20170630005634.GA4448@momjian.us
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
fix_unlogged_hash_index_issue_v1.patchapplication/octet-stream; name=fix_unlogged_hash_index_issue_v1.patchDownload+40-7
Hi,
On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
While discussing the behavior of hash indexes with Bruce in the nearby
thread [1], it has been noticed that hash index on unlogged tables
doesn't behave as expected. Prior to 10, it has the different set of
problems (mainly because hash indexes are not WAL-logged) which were
discussed on that thread [1], however when I checked, it doesn't work
even for 10. Below are steps to reproduce the problem.1. Setup master and standby
2. On the master, create unlogged table and hash index.
2A. Create unlogged table t1(c1 int);
2B. Create hash index idx_t1_hash on t1 using hash(c1);
3. On Standby, try selecting data,
select * from t1;
ERROR: cannot access temporary or unlogged relations during recovery
---Till here everything works as expected
4. Promote standby to master (I have just stopped the standby and
master and removed recovery.conf file from the standby database
location) and try starting the new master, it
gives below error and doesn't get started.
FATAL: could not create file "base/12700/16387": File existsThe basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged. Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork. This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty). However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.
Thanks for sharing the steps to reproduce the issue in detail. I could
easily reproduce this issue. I also had a quick look into the patch and the
fix looks fine to me. However, it would be good if we can add at least one
test for unlogged table with hash index in the regression test suite.
Apart from that i have tested the patch and the patch seems to be working
fine. Here are the steps that i have followed to verify the fix,
With Patch:
========
postgres=# SELECT pg_relation_filepath('t1');
pg_relation_filepath
----------------------
base/14915/16384
(1 row)
postgres=# SELECT pg_relation_filepath('idx_t1_hash');
pg_relation_filepath
----------------------
base/14915/16387
(1 row)
Master:
=====
[ashu@localhost bin]$ ls -l ../master/base/14915/1638*
-rw-------. 1 ashu ashu 18128896 Jul 3 14:29 ../master/base/14915/16384
-rw-------. 1 ashu ashu 24576 Jul 3 14:29 ../master/base/14915/16384_fsm
-rw-------. 1 ashu ashu 0 Jul 3 14:28
../master/base/14915/16384_init
-rw-------. 1 ashu ashu 22339584 Jul 3 14:29 ../master/base/14915/16387
-rw-------. 1 ashu ashu 32768 Jul 3 14:28
../master/base/14915/16387_init
Standby:
======
[ashu@localhost bin]$ ls -l ../standby/base/14915/1638*
-rw-------. 1 ashu ashu 0 Jul 3 14:28 ../standby/base/14915/16384_init
-rw-------. 1 ashu ashu 32768 Jul 3 14:28 ../standby/base/14915/16387_init
Without patch:
==========
postgres=# SELECT pg_relation_filepath('t1');
pg_relation_filepath
----------------------
base/14915/16384
(1 row)
postgres=# SELECT pg_relation_filepath('idx_t1_hash');
pg_relation_filepath
----------------------
base/14915/16387
(1 row)
Master:
=====
[ashu@localhost bin]$ ls -l ../master/base/14915/1638*
-rw-------. 1 ashu ashu 18128896 Jul 3 14:36 ../master/base/14915/16384
-rw-------. 1 ashu ashu 24576 Jul 3 14:36 ../master/base/14915/16384_fsm
-rw-------. 1 ashu ashu 0 Jul 3 14:35
../master/base/14915/16384_init
-rw-------. 1 ashu ashu 22339584 Jul 3 14:36 ../master/base/14915/16387
-rw-------. 1 ashu ashu 32768 Jul 3 14:35
../master/base/14915/16387_init
Standby:
======
[ashu@localhost bin]$ ls -l ../standby/base/14915/1638*
-rw-------. 1 ashu ashu 0 Jul 3 14:35 ../standby/base/14915/16384_init
*-rw-------. 1 ashu ashu 73728 Jul 3 14:35 ../standby/base/14915/16387*
-rw-------. 1 ashu ashu 24576 Jul 3 14:35 ../standby/base/14915/16387_init
As seen from the test results above, it is clear that without patch, both
main fork and init fork were getting WAL logged as the create hash index
WAL logging was being done without checking forkNum type.
_hash_init()
{
................
................
log_newpage(&rel->rd_node,
forkNum,
blkno,
BufferGetPage(buf),
true);
_hash_relbuf(rel, buf);
....................
....................
}
I have also ran the WAL consistency check tool to verify the things and
didn't find any issues. Thanks.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Mon, Jul 3, 2017 at 3:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
The basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged. Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork. This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty). However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.Thanks for sharing the steps to reproduce the issue in detail. I could
easily reproduce this issue. I also had a quick look into the patch and the
fix looks fine to me. However, it would be good if we can add at least one
test for unlogged table with hash index in the regression test suite.
There is already such a test, see create_index.sql.
CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id
int4_ops);
Do you have something else in mind?
I think the problem mentioned in this thread can be caught only if we
promote the standby and restart it. We might be able to write it
using TAP framework, but I think such a test should exist for other
index types as well or in general for unlogged tables. I am not
opposed to writing such a test if you and or others feel so.
Apart from that i have tested the patch and the patch seems to be working
fine.
Thanks.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On Mon, Jul 3, 2017 at 4:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jul 3, 2017 at 3:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
The basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged. Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork. This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty). However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.Thanks for sharing the steps to reproduce the issue in detail. I could
easily reproduce this issue. I also had a quick look into the patch and the
fix looks fine to me. However, it would be good if we can add at least one
test for unlogged table with hash index in the regression test suite.There is already such a test, see create_index.sql.
CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id
int4_ops);Do you have something else in mind?
Yes, that's what i had in my mind. But, good to know that we already
have a test-case with hash index created on an unlogged table.
I think the problem mentioned in this thread can be caught only if we
promote the standby and restart it. We might be able to write it
using TAP framework, but I think such a test should exist for other
index types as well or in general for unlogged tables. I am not
opposed to writing such a test if you and or others feel so.
I think, it would be a good thing to add such test cases using TAP
framework but as you said, it would be good to take others opinion as
well on this. Thanks.
Apart from that i have tested the patch and the patch seems to be working
fine.Thanks.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
There is already such a test, see create_index.sql.
CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id
int4_ops);Do you have something else in mind?
I think the problem mentioned in this thread can be caught only if we
promote the standby and restart it. We might be able to write it
using TAP framework, but I think such a test should exist for other
index types as well or in general for unlogged tables. I am not
opposed to writing such a test if you and or others feel so.
There have been many requests for something like that. This overlaps a
lot with the existing make check though, so a buildfarm module using
wal_consistency_checking may be a better idea than something in core.
Apart from that i have tested the patch and the patch seems to be working
fine.Thanks.
It seems to me that this qualifies as an open item for 10. WAL-logging
is new for hash tables. Amit, could you add one?
+ use_wal = RelationNeedsWAL(rel) ||
+ (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
+ forkNum == INIT_FORKNUM);
In access AMs, empty() routines are always only called for unlogged
relations, so you could relax that to check for INIT_FORKNUM only.
Looking at the patch, I think that you are right to do the logging
directly in _hash_init() and not separately in hashbuildempty().
Compared to other relations the init data is more dynamic.
+ /*
+ * Force the on-disk state of init forks to always be in sync with the
+ * state in shared buffers. See XLogReadBufferForRedoExtended.
+ */
+ XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL);
+ if (forknum == INIT_FORKNUM)
+ FlushOneBuffer(metabuf);
I find those forced syncs a bit surprising. The buffer is already
marked as dirty, so even if there is a concurrent checkpoint they
would be flushed. If recovery comes again here, then they would just
be replayed. I am unclear why XLogReadBufferForRedoExtended is not
enough to replay a FPW of that.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
It seems to me that this qualifies as an open item for 10. WAL-logging
is new for hash tables. Amit, could you add one?
Added.
+ use_wal = RelationNeedsWAL(rel) || + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && + forkNum == INIT_FORKNUM); In access AMs, empty() routines are always only called for unlogged relations, so you could relax that to check for INIT_FORKNUM only.
Yeah, but _hash_init() is an exposed API, so I am not sure it is a
good idea to make such an assumption at that level of code. Now, as
there is no current user which wants to use it any other way, we can
make such an assumption, but does it serve any purpose?
Looking at the patch, I think that you are right to do the logging
directly in _hash_init() and not separately in hashbuildempty().
Compared to other relations the init data is more dynamic.+ /* + * Force the on-disk state of init forks to always be in sync with the + * state in shared buffers. See XLogReadBufferForRedoExtended. + */ + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); + if (forknum == INIT_FORKNUM) + FlushOneBuffer(metabuf); I find those forced syncs a bit surprising. The buffer is already marked as dirty, so even if there is a concurrent checkpoint they would be flushed.
What if there is no concurrent checkpoint activity and the server is
shutdown? Remember this replay happens on standby and we will just
try to perform Restartpoint and there is no guarantee that it will
flush the buffers. If the buffers are not flushed at shutdown, then
the Init fork will be empty on restart and the hash index will be
corrupt.
If recovery comes again here, then they would just
be replayed. I am unclear why XLogReadBufferForRedoExtended is not
enough to replay a FPW of that.
Where does FPW come into the picture for a replay of metapage or bitmappage?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 03, 2017 at 09:12:01AM +0530, Amit Kapila wrote:
While discussing the behavior of hash indexes with Bruce in the nearby
thread [1], it has been noticed that hash index on unlogged tables
doesn't behave as expected. Prior to 10, it has the different set of
problems (mainly because hash indexes are not WAL-logged) which were
discussed on that thread [1], however when I checked, it doesn't work
even for 10. Below are steps to reproduce the problem.1. Setup master and standby
2. On the master, create unlogged table and hash index.
2A. Create unlogged table t1(c1 int);
2B. Create hash index idx_t1_hash on t1 using hash(c1);
3. On Standby, try selecting data,
select * from t1;
ERROR: cannot access temporary or unlogged relations during recovery
---Till here everything works as expected
4. Promote standby to master (I have just stopped the standby and
master and removed recovery.conf file from the standby database
location) and try starting the new master, it
gives below error and doesn't get started.
FATAL: could not create file "base/12700/16387": File existsThe basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged. Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork. This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty). However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.I think this should be considered a PostgreSQL 10 open item.
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A@mail.gmail.com>
On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
It seems to me that this qualifies as an open item for 10. WAL-logging
is new for hash tables. Amit, could you add one?Added.
+ use_wal = RelationNeedsWAL(rel) || + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && + forkNum == INIT_FORKNUM); In access AMs, empty() routines are always only called for unlogged relations, so you could relax that to check for INIT_FORKNUM only.Yeah, but _hash_init() is an exposed API, so I am not sure it is a
good idea to make such an assumption at that level of code. Now, as
there is no current user which wants to use it any other way, we can
make such an assumption, but does it serve any purpose?
Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.
The difference of the two is an init fork of TEMP
relations. However I believe that init fork is by definition a
part of an unlogged relation, it seems to me that it ought not to
be wal-logged if we had it. From this viewpoint, the middle line
makes sense.
Looking at the patch, I think that you are right to do the logging
directly in _hash_init() and not separately in hashbuildempty().
Compared to other relations the init data is more dynamic.+ /* + * Force the on-disk state of init forks to always be in sync with the + * state in shared buffers. See XLogReadBufferForRedoExtended. + */ + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); + if (forknum == INIT_FORKNUM) + FlushOneBuffer(metabuf); I find those forced syncs a bit surprising. The buffer is already marked as dirty, so even if there is a concurrent checkpoint they would be flushed.What if there is no concurrent checkpoint activity and the server is
shutdown? Remember this replay happens on standby and we will just
try to perform Restartpoint and there is no guarantee that it will
flush the buffers. If the buffers are not flushed at shutdown, then
the Init fork will be empty on restart and the hash index will be
corrupt.
XLogReadBufferForRedoExtended() automatically force the flush for
a XLOG record on init fork that having FPW imaeges. And
_hash_init seems to have emitted such a XLOG record using
log_newpage.
If recovery comes again here, then they would just
be replayed. I am unclear why XLogReadBufferForRedoExtended is not
enough to replay a FPW of that.Where does FPW come into the picture for a replay of metapage or bitmappage?
Since required FPW seems to be emitted and the flush should be
done automatically, the issuer side (_hash_init) may attach wrong
FPW images if hash_xlog_init_meta_page requires additional
flushes. But I don't see such a flaw there. I think Michael wants
to say such a kind of thing.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.
| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.
and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);
This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
FWIW..
At Thu, 06 Jul 2017 18:54:47 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170706.185447.256482539.horiguchi.kyotaro@lab.ntt.co.jp>
+ /* + * Force the on-disk state of init forks to always be in sync with the + * state in shared buffers. See XLogReadBufferForRedoExtended. + */ + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); + if (forknum == INIT_FORKNUM) + FlushOneBuffer(metabuf); I find those forced syncs a bit surprising. The buffer is already marked as dirty, so even if there is a concurrent checkpoint they would be flushed.What if there is no concurrent checkpoint activity and the server is
shutdown? Remember this replay happens on standby and we will just
try to perform Restartpoint and there is no guarantee that it will
flush the buffers. If the buffers are not flushed at shutdown, then
the Init fork will be empty on restart and the hash index will be
corrupt.XLogReadBufferForRedoExtended() automatically force the flush for
a XLOG record on init fork that having FPW imaeges. And
_hash_init seems to have emitted such a XLOG record using
log_newpage.If recovery comes again here, then they would just
be replayed. I am unclear why XLogReadBufferForRedoExtended is not
enough to replay a FPW of that.Where does FPW come into the picture for a replay of metapage or bitmappage?
Since required FPW seems to be emitted and the flush should be
done automatically, the issuer side (_hash_init) may attach wrong
"may attach" -> "may have attached"
FPW images if hash_xlog_init_meta_page requires additional
flushes. But I don't see such a flaw there. I think Michael wants
to say such a kind of thing.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 6, 2017 at 3:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,
At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A@mail.gmail.com>
On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
It seems to me that this qualifies as an open item for 10. WAL-logging
is new for hash tables. Amit, could you add one?Added.
+ use_wal = RelationNeedsWAL(rel) || + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && + forkNum == INIT_FORKNUM); In access AMs, empty() routines are always only called for unlogged relations, so you could relax that to check for INIT_FORKNUM only.Yeah, but _hash_init() is an exposed API, so I am not sure it is a
good idea to make such an assumption at that level of code. Now, as
there is no current user which wants to use it any other way, we can
make such an assumption, but does it serve any purpose?Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.The difference of the two is an init fork of TEMP
relations. However I believe that init fork is by definition a
part of an unlogged relation, it seems to me that it ought not to
be wal-logged if we had it. From this viewpoint, the middle line
makes sense.
What is the middle line? Are you okay with a current check or do you
see any problem with it or do you prefer to write it differently?
Looking at the patch, I think that you are right to do the logging
directly in _hash_init() and not separately in hashbuildempty().
Compared to other relations the init data is more dynamic.+ /* + * Force the on-disk state of init forks to always be in sync with the + * state in shared buffers. See XLogReadBufferForRedoExtended. + */ + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); + if (forknum == INIT_FORKNUM) + FlushOneBuffer(metabuf); I find those forced syncs a bit surprising. The buffer is already marked as dirty, so even if there is a concurrent checkpoint they would be flushed.What if there is no concurrent checkpoint activity and the server is
shutdown? Remember this replay happens on standby and we will just
try to perform Restartpoint and there is no guarantee that it will
flush the buffers. If the buffers are not flushed at shutdown, then
the Init fork will be empty on restart and the hash index will be
corrupt.XLogReadBufferForRedoExtended() automatically force the flush for
a XLOG record on init fork that having FPW imaeges. And
_hash_init seems to have emitted such a XLOG record using
log_newpage.
Sure, but log_newpage is not used for metapage and bitmappage.
If recovery comes again here, then they would just
be replayed. I am unclear why XLogReadBufferForRedoExtended is not
enough to replay a FPW of that.Where does FPW come into the picture for a replay of metapage or bitmappage?
Since required FPW seems to be emitted and the flush should be
done automatically, the issuer side (_hash_init) may attach wrong
FPW images if hash_xlog_init_meta_page requires additional
flushes. But I don't see such a flaw there. I think Michael wants
to say such a kind of thing.
I am not able to understand what you want to say, but I think you
don't see any problem with the patch as such.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.
I don't think we need anything with respect to this patch because
ReadBufferWithoutRelcache is used in case of FPW replay of INIT_FORK
pages as well. Do you have something else in mind which I am not able
to see?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 5, 2017 at 11:02 PM, Noah Misch <noah@leadboat.com> wrote:
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
I don't intend to rush this in before the beta2 wrap, although some
other committer is welcome to do so if they feel confident in the fix.
I will update again by July 17th.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.
Checking only for INIT_FORKNUM seems logical to me. Also checking for
RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I
guess Amit copied the test from ATExecSetTablespace, which does it as
he did, but it seems unnecessarily long-winded.
The difference of the two is an init fork of TEMP
relations. However I believe that init fork is by definition a
part of an unlogged relation, it seems to me that it ought not to
be wal-logged if we had it. From this viewpoint, the middle line
makes sense.
Actually, the init fork of an unlogged relation *must* be WAL-logged.
All other forks are removed on a crash (and the main fork is copied
anew from the init fork). But the init fork itself must survive -
therefore it, and only it, must be WAL-logged and thus durable.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.
Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.Checking only for INIT_FORKNUM seems logical to me. Also checking for
RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I
guess Amit copied the test from ATExecSetTablespace, which does it as
he did, but it seems unnecessarily long-winded.
Okay. If you and Michael feel the check that way is better, I will
change and submit the patch.
The difference of the two is an init fork of TEMP
relations. However I believe that init fork is by definition a
part of an unlogged relation, it seems to me that it ought not to
be wal-logged if we had it. From this viewpoint, the middle line
makes sense.Actually, the init fork of an unlogged relation *must* be WAL-logged.
All other forks are removed on a crash (and the main fork is copied
anew from the init fork). But the init fork itself must survive -
therefore it, and only it, must be WAL-logged and thus durable.By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, but
let me know if you think that we should add such a comment in this
patch itself.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, ...
True, although maybe hash indexes are the only way that happens today?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, ...True, although maybe hash indexes are the only way that happens today?
No, I think it will happen for all other indexes as well. Basically,
we log new page for init forks of other indexes and then while
restoring that full page image, it will fall through the same path.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 8, 2017 at 9:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.Checking only for INIT_FORKNUM seems logical to me. Also checking for
RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I
guess Amit copied the test from ATExecSetTablespace, which does it as
he did, but it seems unnecessarily long-winded.Okay. If you and Michael feel the check that way is better, I will
change and submit the patch.
Changed as per suggestion.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, but
let me know if you think that we should add such a comment in this
patch itself.
Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+-DUto+MyeNdLE9P9u8G3Fv6n+SOjPSqmPSw6ashhPjw@mail.gmail.com>
On Sat, Jul 8, 2017 at 9:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.Checking only for INIT_FORKNUM seems logical to me. Also checking for
RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I
guess Amit copied the test from ATExecSetTablespace, which does it as
he did, but it seems unnecessarily long-winded.Okay. If you and Michael feel the check that way is better, I will
change and submit the patch.Changed as per suggestion.
On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, ...True, although maybe hash indexes are the only way that happens today?
No, I think it will happen for all other indexes as well. Basically,
we log new page for init forks of other indexes and then while
restoring that full page image, it will fall through the same path.
(Sorry, I didn't noticed that hash metapage is not using log_newpgae)
For example, (bt|gin|gist|spgist|brin)buildempty are using
log_newpage to log INIT_FORK metapages. I believe that the xlog
flow from log_newpage to XLogReadBufferForRedoExtended is
developed so that pages in init forks are surely flushed during
recovery, so we should fix it instead adding other flushes if the
path doesn't work. Am I looking wrong place? (I think it is
working.)
If I'm understanding correctly here, hashbild and hashbuildempty
should be refactored using the same structure with other *build
and *buildempty functions. Specifically metapages initialization
subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage
or...) does only on-memory initialization and does not emit WAL,
then *build and *buildempty emits WAL in their required way.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, but
let me know if you think that we should add such a comment in this
patch itself.Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
+ * NB: At present, this function may only be used on permanent relations and
+ * init fork of an unlogged relation, which is OK, because we only use it
+ * during XLOG replay. If in the future we want to use it on temporary or
+ * unlogged relations, we could pass additional parameters.
*I* think the target of the funcion is permanent relations and
init forks, not unlogged relations. And I'd like to have an
additional comment about RELPERSISTENCE_PERMANENT. Something like
the attached.
reagards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
fix_comment_atop_ReadBufferWithoutRelcache_v2.patchtext/x-patch; charset=us-asciiDownload+8-4
On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+-DUto+MyeNdLE9P9u8G3Fv6n+SOjPSqmPSw6ashhPjw@mail.gmail.com>
On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, ...True, although maybe hash indexes are the only way that happens today?
No, I think it will happen for all other indexes as well. Basically,
we log new page for init forks of other indexes and then while
restoring that full page image, it will fall through the same path.(Sorry, I didn't noticed that hash metapage is not using log_newpgae)
The bitmappage is also not using log_newpage.
For example, (bt|gin|gist|spgist|brin)buildempty are using
log_newpage to log INIT_FORK metapages. I believe that the xlog
flow from log_newpage to XLogReadBufferForRedoExtended is
developed so that pages in init forks are surely flushed during
recovery, so we should fix it instead adding other flushes if the
path doesn't work. Am I looking wrong place? (I think it is
working.)
You are looking at right place.
If I'm understanding correctly here, hashbild and hashbuildempty
should be refactored using the same structure with other *build
and *buildempty functions. Specifically metapages initialization
subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage
or...) does only on-memory initialization and does not emit WAL,
then *build and *buildempty emits WAL in their required way.
I have considered this way of doing as well, read my first e-mail [1]/messages/by-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
in this thread "Another approach to fix the issue ....". It is not
that we can't change the code of hashbuildempty to make it log new
pages for all kind of pages (metapage, bitmappage and datapages), but
I am not sure if there is a value in going in that direction. If we
have to go in that direction, we need to hard code some of the values
like hashm_nmaps and hashm_mapp in metapage rather than initializing
them after bitmappage creation. Before going in that direction, I
think we should also take opinion from other people especially some
committer as we might need to maintain two different ways of doing
almost the same thing.
I am also not 100% comfortable with adding flush at two new places,
but that makes the code for fix simpler and fundamentally there is no
problem in doing so.
There is a small downside to always logging new page which is that it
will always log full page image in WAL which has the potential to
increase WAL volume if there are many unlogged tables and indexes.
Now considering the init fork generally has very less pages, it is not
a big concern, but still avoiding full page image has some value.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, but
let me know if you think that we should add such a comment in this
patch itself.Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
+ * NB: At present, this function may only be used on permanent relations and + * init fork of an unlogged relation, which is OK, because we only use it + * during XLOG replay. If in the future we want to use it on temporary or + * unlogged relations, we could pass additional parameters.*I* think the target of the funcion is permanent relations and
init forks, not unlogged relations. And I'd like to have an
additional comment about RELPERSISTENCE_PERMANENT. Something like
the attached.
Okay, let's leave this for committer to decide.
[1]: /messages/by-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1JYyO5Hcxx4rP0jb=JmMC4qvY1YvG9UvkwNr5qrojsOPw@mail.gmail.com>
On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+-DUto+MyeNdLE9P9u8G3Fv6n+SOjPSqmPSw6ashhPjw@mail.gmail.com>
On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, ...True, although maybe hash indexes are the only way that happens today?
No, I think it will happen for all other indexes as well. Basically,
we log new page for init forks of other indexes and then while
restoring that full page image, it will fall through the same path.(Sorry, I didn't noticed that hash metapage is not using log_newpgae)
The bitmappage is also not using log_newpage.
For example, (bt|gin|gist|spgist|brin)buildempty are using
log_newpage to log INIT_FORK metapages. I believe that the xlog
flow from log_newpage to XLogReadBufferForRedoExtended is
developed so that pages in init forks are surely flushed during
recovery, so we should fix it instead adding other flushes if the
path doesn't work. Am I looking wrong place? (I think it is
working.)You are looking at right place.
If I'm understanding correctly here, hashbild and hashbuildempty
should be refactored using the same structure with other *build
and *buildempty functions. Specifically metapages initialization
subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage
or...) does only on-memory initialization and does not emit WAL,
then *build and *buildempty emits WAL in their required way.I have considered this way of doing as well, read my first e-mail [1]
in this thread "Another approach to fix the issue ....". It is not
that we can't change the code of hashbuildempty to make it log new
pages for all kind of pages (metapage, bitmappage and datapages), but
I am not sure if there is a value in going in that direction. If we
have to go in that direction, we need to hard code some of the values
like hashm_nmaps and hashm_mapp in metapage rather than initializing
them after bitmappage creation. Before going in that direction, I
think we should also take opinion from other people especially some
committer as we might need to maintain two different ways of doing
almost the same thing.
Thanks for the explanation and the pointer (to the start of this
thread.. sorry).
I am also not 100% comfortable with adding flush at two new places,
but that makes the code for fix simpler and fundamentally there is no
problem in doing so.
I agree that the patch would be simpler. Ok, I am satisfied with
an additional comment for _hash_init and hash_xlog_init_meta_page
that describes the reason that _hash_init doesn't/can't use
log_newpage and thus requires explicit flushes. Something like
the description in [1] would be enough.
There is a small downside to always logging new page which is that it
will always log full page image in WAL which has the potential to
increase WAL volume if there are many unlogged tables and indexes.
Now considering the init fork generally has very less pages, it is not
a big concern, but still avoiding full page image has some value.
Perhaps more effective mechanism will be developed at the time.
By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay. If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.and does
| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
mode, strategy, &hit);This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.Yeah, it should probably mention that the init fork of an unlogged
relation is also OK.I think we should do that as a separate patch (I can write the same as
well) because that is not new behavior introduced by this patch, but
let me know if you think that we should add such a comment in this
patch itself.Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
+ * NB: At present, this function may only be used on permanent relations and + * init fork of an unlogged relation, which is OK, because we only use it + * during XLOG replay. If in the future we want to use it on temporary or + * unlogged relations, we could pass additional parameters.*I* think the target of the funcion is permanent relations and
init forks, not unlogged relations. And I'd like to have an
additional comment about RELPERSISTENCE_PERMANENT. Something like
the attached.Okay, let's leave this for committer to decide.
Agreed, thanks.
[1] - /messages/by-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 10, 2017 at 3:27 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi,
At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1JYyO5Hcxx4rP0jb=JmMC4qvY1YvG9UvkwNr5qrojsOPw@mail.gmail.com>
I am also not 100% comfortable with adding flush at two new places,
but that makes the code for fix simpler and fundamentally there is no
problem in doing so.I agree that the patch would be simpler. Ok, I am satisfied with
an additional comment for _hash_init and hash_xlog_init_meta_page
that describes the reason that _hash_init doesn't/can't use
log_newpage and thus requires explicit flushes. Something like
the description in [1] would be enough.
I have modified the comment in hash_xlog_init_meta_page and a
corresponding function for bitmap page. However, I think adding
anything about not using log_newpage in _hash_init doesn't sound good
to me. I think you have suggested it so that we don't forget the
reason for not using log_newpage, but I think that is overkill. Let
me know if you have any other concerns?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com