Use log_newpage_range in HASH index build

Started by Kirill Reshke4 months ago6 messageshackers
Jump to latest
#1Kirill Reshke
reshkekirill@gmail.com

There exists an optimization to index creation process, when we omit
to write any WAL
for index build. It is currently supported in B Tree, GIN, GiST, spg indexes.
It works because we do not need to recover anything if index creation
fails, because if was not used by any query. So, the index can be
built on-disk, and then, just before making the index alive, we can
simply log all pages to WAL.

Hash index currently lacks this optimization.
PFA implementation.

During my testing, I checked the amount of WAL generated by index
build before and after patch applied. My script was something like:

select pg_current_wal_insert_lsn();

create index on t using hash (i);

select pg_current_wal_insert_lsn();

select pg_lsn_wal_diff(lsn1, lsn2);

Resulting numbers depend on index size, but I got 2.5-3.5 times less
WAL with this patch and 8 times less WAL with this patch +
wal_compression=on.
Index creation time, however, did not change much...

About implementation:
These are many types of record that can be generated during index build.
I know for sure these are possible (double-checked using pg_waldump):

SPLIT_COMPLETE
INSERT
SPLIT_ALLOCATE_PAGE
SPLIT_PAGE
ADD_OVFL_PAGE
SQUEEZE_PAGE
INIT_META_PAGE
INIT_BITMAP_PAGE

Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated
during index build. I'm not sure about MOVE_PAGE_CONTENTS.

So, implementation is simply pass isbuild flag everywhere something is
wal-logged. Looks like it is less invasive than alternatives.

--
Best regards,
Kirill Reshke

Attachments:

v2-0001-Use-log_newpage_range-in-HASH-index-build.patchapplication/octet-stream; name=v2-0001-Use-log_newpage_range-in-HASH-index-build.patchDownload+58-45
#2lakshmi
lakshmigcdac@gmail.com
In reply to: Kirill Reshke (#1)
Re: Use log_newpage_range in HASH index build

Hi Kirill,

I tested your patch on the current master and confirmed the WAL reduction
during HASH index build.

While testing, I noticed a possible small follow-up improvement in HASH
overflow handling. Currently, any free overflow page may be reused, which
can scatter overflow chains and hurt cache locality. Reusing recently freed
overflow pages first could help, without changing WAL behavior or on-disk
format.

I would like to work on this as a follow-up enhancement and would welcome
any suggestions.

Best regards,
Lakshmi

On Tue, Dec 23, 2025 at 2:31 PM Kirill Reshke <reshkekirill@gmail.com>
wrote:

Show quoted text

There exists an optimization to index creation process, when we omit
to write any WAL
for index build. It is currently supported in B Tree, GIN, GiST, spg
indexes.
It works because we do not need to recover anything if index creation
fails, because if was not used by any query. So, the index can be
built on-disk, and then, just before making the index alive, we can
simply log all pages to WAL.

Hash index currently lacks this optimization.
PFA implementation.

During my testing, I checked the amount of WAL generated by index
build before and after patch applied. My script was something like:

select pg_current_wal_insert_lsn();

create index on t using hash (i);

select pg_current_wal_insert_lsn();

select pg_lsn_wal_diff(lsn1, lsn2);

Resulting numbers depend on index size, but I got 2.5-3.5 times less
WAL with this patch and 8 times less WAL with this patch +
wal_compression=on.
Index creation time, however, did not change much...

About implementation:
These are many types of record that can be generated during index build.
I know for sure these are possible (double-checked using pg_waldump):

SPLIT_COMPLETE
INSERT
SPLIT_ALLOCATE_PAGE
SPLIT_PAGE
ADD_OVFL_PAGE
SQUEEZE_PAGE
INIT_META_PAGE
INIT_BITMAP_PAGE

Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated
during index build. I'm not sure about MOVE_PAGE_CONTENTS.

So, implementation is simply pass isbuild flag everywhere something is
wal-logged. Looks like it is less invasive than alternatives.

--
Best regards,
Kirill Reshke

#3lakshmi
lakshmigcdac@gmail.com
In reply to: lakshmi (#2)
Re: Use log_newpage_range in HASH index build

Hi Kirill,
Following up on my earlier note, I implemented the proposed HASH overflow
page reuse enhancement.Recently freed overflow pages are recorded in
_hash_freeovflpage( ),and _hash_addovflpage( ) now prefers reusing those
pages during allocation before falling back to the bitmap scan.

The change is backend-local and allocation-only,with no WAL or on-disk
format changes.I verified correctness using index build/drop and VACUUM
cycles,and confirmed WAL neutrality by comparing WAL generated during HASH
index builds with and without this change (no observable difference beyond
normal noise).

The patch is attached for review.Feedback is welcome.

Best regards,
Lakshmi

On Tue, Dec 23, 2025 at 5:23 PM lakshmi <lakshmigcdac@gmail.com> wrote:

Show quoted text

Hi Kirill,

I tested your patch on the current master and confirmed the WAL reduction
during HASH index build.

While testing, I noticed a possible small follow-up improvement in HASH
overflow handling. Currently, any free overflow page may be reused, which
can scatter overflow chains and hurt cache locality. Reusing recently freed
overflow pages first could help, without changing WAL behavior or on-disk
format.

I would like to work on this as a follow-up enhancement and would welcome
any suggestions.

Best regards,
Lakshmi

On Tue, Dec 23, 2025 at 2:31 PM Kirill Reshke <reshkekirill@gmail.com>
wrote:

There exists an optimization to index creation process, when we omit
to write any WAL
for index build. It is currently supported in B Tree, GIN, GiST, spg
indexes.
It works because we do not need to recover anything if index creation
fails, because if was not used by any query. So, the index can be
built on-disk, and then, just before making the index alive, we can
simply log all pages to WAL.

Hash index currently lacks this optimization.
PFA implementation.

During my testing, I checked the amount of WAL generated by index
build before and after patch applied. My script was something like:

select pg_current_wal_insert_lsn();

create index on t using hash (i);

select pg_current_wal_insert_lsn();

select pg_lsn_wal_diff(lsn1, lsn2);

Resulting numbers depend on index size, but I got 2.5-3.5 times less
WAL with this patch and 8 times less WAL with this patch +
wal_compression=on.
Index creation time, however, did not change much...

About implementation:
These are many types of record that can be generated during index build.
I know for sure these are possible (double-checked using pg_waldump):

SPLIT_COMPLETE
INSERT
SPLIT_ALLOCATE_PAGE
SPLIT_PAGE
ADD_OVFL_PAGE
SQUEEZE_PAGE
INIT_META_PAGE
INIT_BITMAP_PAGE

Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated
during index build. I'm not sure about MOVE_PAGE_CONTENTS.

So, implementation is simply pass isbuild flag everywhere something is
wal-logged. Looks like it is less invasive than alternatives.

--
Best regards,
Kirill Reshke

Attachments:

0001-hash-reuse-recently-freed-overflow-pages.patchtext/x-patch; charset=US-ASCII; name=0001-hash-reuse-recently-freed-overflow-pages.patchDownload+77-1
#4Robert Haas
robertmhaas@gmail.com
In reply to: Kirill Reshke (#1)
Re: Use log_newpage_range in HASH index build

On Thu, Oct 23, 2025 at 3:21 PM Kirill Reshke <reshkekirill@gmail.com> wrote:

So, implementation is simply pass isbuild flag everywhere something is
wal-logged. Looks like it is less invasive than alternatives.

I think that in order to be seriously considered, this patch will need
more than zero words of comments and more than a one-line commit
message.

The XXX comment would need addressing, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: lakshmi (#3)
Re: Use log_newpage_range in HASH index build

On Mon, Jan 5, 2026 at 3:48 AM lakshmi <lakshmigcdac@gmail.com> wrote:

Following up on my earlier note, I implemented the proposed HASH overflow page reuse enhancement.Recently freed overflow pages are recorded in
_hash_freeovflpage( ),and _hash_addovflpage( ) now prefers reusing those pages during allocation before falling back to the bitmap scan.

The change is backend-local and allocation-only,with no WAL or on-disk format changes.I verified correctness using index build/drop and VACUUM cycles,and confirmed WAL neutrality by comparing WAL generated during HASH index builds with and without this change (no observable difference beyond normal noise).

The patch is attached for review.Feedback is welcome.

It is probably best to start a separate email thread specifically
about this patch, because it's doing something quite different from
the original patch. I suggest presenting some specific performance
results, as it's not very clear how much benefit this might have. It
might be good to test a favorable scenario and also an unfavorable
scenario, describing and showing results for each.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6lakshmi
lakshmigcdac@gmail.com
In reply to: Robert Haas (#5)
Re: Use log_newpage_range in HASH index build

Hi,

Show quoted text

It is probably best to start a separate email thread specifically
about this patch, because it's doing something quite different from
the original patch. I suggest presenting some specific performance
results, as it's not very clear how much benefit this might have. It
might be good to test a favorable scenario and also an unfavorable
scenario, describing and showing results for each.

Thanks for the suggestion. I’ll start a separate thread for this patch
with performance results.

Regards,
Lakshmi