Separate HEAP WAL replay logic into its own file
Hi PostgreSQL hackers,
For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL replay logic are organized in separate source files. However, the HEAP access method is an exception. Both the access method and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other access methods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the common heap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files.
I hope people find this straightforward refactoring helpful.
Yong
Attachments:
heapam_refactor.patchapplication/octet-stream; name=heapam_refactor.patchDownload+1353-1321
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote:
Hi PostgreSQL hackers,
For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL replay logic are organized in separate source files. However, the HEAP access method is an exception. Both the access method and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other access methods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the common heap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files.
I'm not against this change, but I am curious at what inspired this.
Were you looking at Postgres code and simply noticed that there isn't
a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
wanted to change that? Or is there some specific reason this would
help you as a Postgres developer, user, or ecosystem member?
- Melanie
On Jun 17, 2024, at 23:01, Melanie Plageman <melanieplageman@gmail.com> wrote:
External Email
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong <yoli@ebay.com> wrote:
Hi PostgreSQL hackers,
For most access methods in PostgreSQL, the implementation of the access method itself and the implementation of its WAL replay logic are organized in separate source files. However, the HEAP access method is an exception. Both the access method and the WAL replay logic are collocated in the same heapam.c. To follow the pattern established by other access methods and to improve maintainability, I made the enclosed patch to separate HEAP’s replay logic into its own file. The changes are straightforward. Move the replay related functions into the new heapam_xlog.c file, push the common heap_execute_freeze_tuple() helper function into the heapam.h header, and adjust the build files.
I'm not against this change, but I am curious at what inspired this.
Were you looking at Postgres code and simply noticed that there isn't
a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
wanted to change that? Or is there some specific reason this would
help you as a Postgres developer, user, or ecosystem member?- Melanie
As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me to find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the same for the heap access method. When I finally found them (via text search), it was a small surprise. Having different file organizations for different access methods gives me this urge to make everything consistent. I think it will make it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.c not some “???xlog.c”.
Yong
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote:
As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me to find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the same for the heap access method. When I finally found them (via text search), it was a small surprise. Having different file organizations for different access methods gives me this urge to make everything consistent. I think it will make it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.c not some “???xlog.c”.
That makes sense. The branch for PG18 has not been cut yet, so I
recommend registering this patch for the July commitfest [1]https://commitfest.postgresql.org/48/ so it
doesn't get lost.
- Melanie
On Jun 18, 2024, at 20:42, Melanie Plageman <melanieplageman@gmail.com> wrote:
External Email
On Mon, Jun 17, 2024 at 9:12 PM Li, Yong <yoli@ebay.com> wrote:
As a newcomer, when I was walking through the code looking for WAL replay related code, it was relatively easy for me to find them for the B-Tree access method because of the “xlog” hint in the file names. It took me a while to find the same for the heap access method. When I finally found them (via text search), it was a small surprise. Having different file organizations for different access methods gives me this urge to make everything consistent. I think it will make it easier for newcomers, and it will reduce the mental load for everyone to remember that heap replay is inside the heapam.c not some “???xlog.c”.
That makes sense. The branch for PG18 has not been cut yet, so I
recommend registering this patch for the July commitfest [1] so it
doesn't get lost.- Melanie
Thanks for the positive feedback. I’ve added the patch to the July CF.
Yong
Hi,
I'm reviewing patches in commitfest 2024-07:
https://commitfest.postgresql.org/48/
This is the 5th patch:
https://commitfest.postgresql.org/48/5054/
FYI: https://commitfest.postgresql.org/48/4681/ is my patch.
In <DAB53475-2470-4431-A3D1-A2A855530EAB@ebay.com>
"Re: Separate HEAP WAL replay logic into its own file" on Tue, 18 Jun 2024 01:12:42 +0000,
"Li, Yong" <yoli@ebay.com> wrote:
As a newcomer, when I was walking through the code looking
for WAL replay related code, it was relatively easy for me
to find them for the B-Tree access method because of the
“xlog” hint in the file names. It took me a while to
find the same for the heap access method. When I finally
found them (via text search), it was a small
surprise. Having different file organizations for
different access methods gives me this urge to make
everything consistent. I think it will make it easier for
newcomers, and it will reduce the mental load for everyone
to remember that heap replay is inside the heapam.c not
some “???xlog.c”.
It makes sense.
Here are my comments for your patch:
1. Could you create your patch by "git format-patch -vN master"
or something? If you create your patch by "git format-patch",
we can apply your patch by "git am XXX.patch".
2. I confirmed that all heapam.c -> heapam_xlog.c/heapam.h
moves don't change implementations. I re-moved moved
codes to heapam.c and there is no diff in heapam.c.
3. Could you remove '#include "access/heapam_xlog.h"' from
heapam.c because it's needless now.
BTW, it seems that we can remove more includes from
heapam.c:
----
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bc6d4868975..f1671072576 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -31,42 +31,24 @@
*/
#include "postgres.h"
-#include "access/bufmask.h"
#include "access/heapam.h"
-#include "access/heapam_xlog.h"
#include "access/heaptoast.h"
#include "access/hio.h"
#include "access/multixact.h"
-#include "access/parallel.h"
-#include "access/relscan.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
-#include "access/sysattr.h"
-#include "access/tableam.h"
-#include "access/transam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
-#include "access/xact.h"
-#include "access/xlog.h"
#include "access/xloginsert.h"
-#include "access/xlogutils.h"
-#include "catalog/catalog.h"
#include "commands/vacuum.h"
-#include "miscadmin.h"
#include "pgstat.h"
-#include "port/atomics.h"
#include "port/pg_bitutils.h"
-#include "storage/bufmgr.h"
-#include "storage/freespace.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
-#include "storage/standby.h"
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
-#include "utils/relcache.h"
-#include "utils/snapmgr.h"
#include "utils/spccache.h"
---
We may want to work on removing needless includes as a
separated cleanup task.
4. Could you remove needless includes from heapam_xlog.c? It
seems that we can remove the following includes:
----
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index b372f2b4afc..af4976f382d 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -16,16 +16,11 @@
#include "access/bufmask.h"
#include "access/heapam.h"
-#include "access/heapam_xlog.h"
-#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xlogutils.h"
-#include "port/atomics.h"
-#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/standby.h"
-#include "utils/relcache.h"
----
5. There are still WAL related codes in heapam.c:
4.1. log_heap_update()
4.2. log_heap_new_cid()
4.3. if (RelationNeedsWAL()) {...} in heap_insert()
4.4. if (needwal) {...} in heap_multi_insert()
4.5. if (RelationNeedsWAL()) {...} in heap_delete()
4.6. if (RelationNeedsWAL()) {...}s in heap_update()
4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
4.12. log_heap_visible()
Should we move them to head_xlog.c too?
If we should do it, separated commits will be easy to
review. For example, the 0001 patch moves existing codes
to head_xlog.c as-is. The 0002 patch extracts WAL related
codes in heap_insert() to heap_xlog.c and so on.
Thanks,
--
kou
On Jul 23, 2024, at 09:54, Sutou Kouhei <kou@clear-code.com> wrote:
Here are my comments for your patch:
1. Could you create your patch by "git format-patch -vN master"
or something? If you create your patch by "git format-patch",
we can apply your patch by "git am XXX.patch".
Thanks for your review. I’ve updated the patch to follow your
suggested format.
3. Could you remove '#include "access/heapam_xlog.h"' from
heapam.c because it's needless now.BTW, it seems that we can remove more includes from
heapam.c:4. Could you remove needless includes from heapam_xlog.c? It
seems that we can remove the following includes:
I have removed the redundant includes in the latest patch.
5. There are still WAL related codes in heapam.c:
4.1. log_heap_update()
4.2. log_heap_new_cid()
4.3. if (RelationNeedsWAL()) {...} in heap_insert()
4.4. if (needwal) {...} in heap_multi_insert()
4.5. if (RelationNeedsWAL()) {...} in heap_delete()
4.6. if (RelationNeedsWAL()) {...}s in heap_update()
4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
4.12. log_heap_visible()Should we move them to head_xlog.c too?
If we should do it, separated commits will be easy to
review. For example, the 0001 patch moves existing codes
to head_xlog.c as-is. The 0002 patch extracts WAL related
codes in heap_insert() to heap_xlog.c and so on.Thanks,
--
kou
I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.
Please help review the updated file again. Thanks in advance!
Yong
Attachments:
v2-0001-heapam_refactor.patchapplication/octet-stream; name=v2-0001-heapam_refactor.patchDownload+1348-1340
Hi,
In <599E67D2-2929-4858-B8BC-F9C4AE889BF6@ebay.com>
"Re: Separate HEAP WAL replay logic into its own file" on Fri, 26 Jul 2024 07:56:12 +0000,
"Li, Yong" <yoli@ebay.com> wrote:
1. Could you create your patch by "git format-patch -vN master"
or something? If you create your patch by "git format-patch",
we can apply your patch by "git am XXX.patch".Thanks for your review. I’ve updated the patch to follow your
suggested format.
Thanks. I could apply your patch by "git am
v2-0001-heapam_refactor.patch".
Could you use the following format for the commit message
next time?
----
${TITLE}
${DESCRIPTION}
----
For example:
----
Separate HEAP WAL replay logic into its own file
Most access methods (i.e. nbtree and hash) use a separate
file with "xlog" in its name for its WAL replay logic. Heap
is one exception of this convention. To make it easier for
newcomers to find the WAL replay logic for the heap access
method, this patch isolates heap's replay logic in a new
heapam_xlog.c file. This patch is a pure refactoring with no
change to the logic.
----
This is a commonly used Git's commit message format. See
also other commit messages by "git log".
5. There are still WAL related codes in heapam.c:
4.1. log_heap_update()
4.2. log_heap_new_cid()
4.3. if (RelationNeedsWAL()) {...} in heap_insert()
4.4. if (needwal) {...} in heap_multi_insert()
4.5. if (RelationNeedsWAL()) {...} in heap_delete()
4.6. if (RelationNeedsWAL()) {...}s in heap_update()
4.7. if (RelationNeedsWAL()) {...} in heap_lock_tuple()
4.8. if (RelationNeedsWAL()) {...} in heap_lock_updated_tuple_rec()
4.9. if (RelationNeedsWAL()) {...} in heap_finish_speculative()
4.10. if (RelationNeedsWAL()) {...} in heap_abort_speculative()
4.11. if (RelationNeedsWAL()) {...} in heap_inplace_update()
4.12. log_heap_visible()Should we move them to head_xlog.c too?
If we should do it, separated commits will be easy to
review. For example, the 0001 patch moves existing codes
to head_xlog.c as-is. The 0002 patch extracts WAL related
codes in heap_insert() to heap_xlog.c and so on.I followed the convention of most access methods. The “xlog”
file includes the WAL replay logic only. The logic that generates
the log records themselves stays with the code that performs
the changes. Take nbtree as an example, you can also find
WAL generating code in several _bt_insertxxx() functions inside
the nbtinsert.c file.
You're right. Sorry.
I think that this proposal is reasonable but we need to get
attention from a committer to move forward this proposal.
Thanks,
--
kou
I think that this proposal is reasonable but we need to get
attention from a committer to move forward this proposal.Thanks,
—
kou
Thank you Kou for your review. I will move the CF to the next
phase and see what happens.
Regards,
Yong
On Tue, Jul 30, 2024 at 06:48:26AM +0000, Li, Yong wrote:
Thank you Kou for your review. I will move the CF to the next
phase and see what happens.
Quite a fan of what you are proposing here, knowing that heapam.c is
still 8.8k lines of code even after moving the 1.3k lines dedicated to
WAL records.
+#include "access/heapam_xlog.h"
This is included in heapam.h, but missing from the patch. I guess
that you fat-fingered a `git add`.
--
Michael
On Wed, Sep 11, 2024 at 04:41:49PM +0900, Michael Paquier wrote:
+#include "access/heapam_xlog.h"
This is included in heapam.h, but missing from the patch. I guess
that you fat-fingered a `git add`.
It looks like my mind was wondering away when I wrote this part.
Sorry for the useless noise.
--
Michael
On Thu, Sep 12, 2024 at 08:12:30AM +0900, Michael Paquier wrote:
It looks like my mind was wondering away when I wrote this part.
Sorry for the useless noise.
I was looking at all that, and this is only moving code around. While
the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad
but historical, the header cleanup in heapam.c is nice.
Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to
XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the
opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h
was less interesting. Just to say that I am agreeing with you here
and I have let this part as you suggested originally.
I was wondering for a bit about the order of the functions for heap
and heap, but these are ordered in their own, which is also OK. I
have added a few more comments at the top of each subroutine for the
records to be more consistent, and applied the result.
--
Michael
On Sep 12, 2024, at 13:39, Michael Paquier <michael@paquier.xyz> wrote:
External Email
I was looking at all that, and this is only moving code around. While
the part for heap_xlog_logical_rewrite in rewriteheap.c is a bit sad
but historical, the header cleanup in heapam.c is nice.Seeing heap_execute_freeze_tuple in heapam.h due to the dependency to
XLH_INVALID_XVAC and XLH_FREEZE_XVAC is slightly surprising, but the
opposite where heap_execute_freeze_tuple() would be in heapam_xlog.h
was less interesting. Just to say that I am agreeing with you here
and I have let this part as you suggested originally.I was wondering for a bit about the order of the functions for heap
and heap, but these are ordered in their own, which is also OK. I
have added a few more comments at the top of each subroutine for the
records to be more consistent, and applied the result.
—
Michael
I am so glad to see that my patch got committed. Thank you a lot for it!
This is my first accepted patch. It really means a lot to me.
Yong