[PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

Started by Long Songover 1 year ago5 messages
#1Long Song
songlong88@126.com
1 attachment(s)

Hi hackers,
When I read the code, I noticed that in SimpleLruWriteAll(), only the last error is
recorded when the file fails to close. Like the following,
```void
SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
{
SlruShared shared = ctl->shared;
SlruWriteAllData fdata;
int64 pageno = 0;
int prevbank = SlotGetBankNumber(0);
bool ok;
...
/*
* Now close any files that were open
*/
ok = true;
for (int i = 0; i < fdata.num_files; i++)
{
if (CloseTransientFile(fdata.fd[i]) != 0)
{
slru_errcause = SLRU_CLOSE_FAILED;
slru_errno = errno;
pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
ok = false;
}
}
if (!ok)
SlruReportIOError(ctl, pageno, InvalidTransactionId);
```
// Here, SlruReportIOError() is called only once, meaning that the last error message is
recorded. In my opinion, since failure to close a file is not common, logging an error
message every time a failure occurs will not result in much log growth, but detailed error
logging will help in most cases.

So, I changed the code to move the call to SlruReportIOError() inside the while loop.

Attached is the patch, I hope it can help.

Attachments:

v1-0001-Fix-error-report-missing-of-SimpleLruWriteAll.patchapplication/octet-stream; name=v1-0001-Fix-error-report-missing-of-SimpleLruWriteAll.patchDownload
From 698d62e4f597d082a2213c03516f555df86eb1f1 Mon Sep 17 00:00:00 2001
From: solo <songlong88@126.com>
Date: Sat, 25 May 2024 14:17:01 +0800
Subject: [PATCH v1] Fix the error-report missing in SimpleLruWriteAll

---
 src/backend/access/transam/slru.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 77b05cc0a..d0f0d68b3 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1307,7 +1307,6 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	SlruWriteAllData fdata;
 	int64		pageno = 0;
 	int			prevbank = SlotGetBankNumber(0);
-	bool		ok;
 
 	/* update the stats counter of flushes */
 	pgstat_count_slru_flush(shared->slru_stats_idx);
@@ -1356,7 +1355,6 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	/*
 	 * Now close any files that were open
 	 */
-	ok = true;
 	for (int i = 0; i < fdata.num_files; i++)
 	{
 		if (CloseTransientFile(fdata.fd[i]) != 0)
@@ -1364,11 +1362,9 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 			slru_errcause = SLRU_CLOSE_FAILED;
 			slru_errno = errno;
 			pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
-			ok = false;
+			SlruReportIOError(ctl, pageno, InvalidTransactionId);
 		}
 	}
-	if (!ok)
-		SlruReportIOError(ctl, pageno, InvalidTransactionId);
 
 	/* Ensure that directory entries for new files are on disk. */
 	if (ctl->sync_handler != SYNC_HANDLER_NONE)
-- 
2.43.0.windows.1

#2Long Song
songlong88@126.com
In reply to: Long Song (#1)
Re:[PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

Hi,
Actually, I still wonder why only the error message
of the last failure to close the file was recorded.
For this unusual situation, it is acceptable to
record all failure information without causing
too much logging.
Was it designed that way on purpose?

At 2024-05-25 17:29:00, "Long Song" <songlong88@126.com> wrote:

Hi hackers,
When I read the code, I noticed that in SimpleLruWriteAll(), only the last error is 
recorded when the file fails to close. Like the following,
```void
SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
{
        SlruShared      shared = ctl->shared;
        SlruWriteAllData fdata;
        int64           pageno = 0;
        int                     prevbank = SlotGetBankNumber(0);
        bool            ok;
...
        /*
         * Now close any files that were open
         */
        ok = true;
        for (int i = 0; i < fdata.num_files; i++)
        {
                if (CloseTransientFile(fdata.fd[i]) != 0)
                {
                        slru_errcause = SLRU_CLOSE_FAILED;
                        slru_errno = errno;
                        pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
                        ok = false;
                }
        }
        if (!ok)
                SlruReportIOError(ctl, pageno, InvalidTransactionId);
```
// Here, SlruReportIOError() is called only once, meaning that the last error message is
 recorded. In my opinion, since failure to close a file is not common, logging an error 
message every time a failure occurs will not result in much log growth, but detailed error 
logging will help in most cases.

So, I changed the code to move the call to SlruReportIOError() inside the while loop.

Attached is the patch, I hope it can help.

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Long Song (#2)
Re: [PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

At Tue, 28 May 2024 20:15:59 +0800 (CST), "Long Song" <songlong88@126.com> wrote in

Hi,
Actually, I still wonder why only the error message
of the last failure to close the file was recorded.
For this unusual situation, it is acceptable to
record all failure information without causing
too much logging.
Was it designed that way on purpose?

Note that SlruReportIOError() causes a non-local exit. To me, the
point of the loop seems to be that we want to close every single file,
apart from the failed ones. From that perspective, the patch disrupts
that intended behavior by exiting in the middle of the loop. It seems
we didn't want to bother collecting errors for every failed file in
that part.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Long Song
songlong88@126.com
In reply to: Kyotaro Horiguchi (#3)
Re:Re: [PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

Hi Kyotaro,
Thank you for the response.

At 2024-06-04 14:44:09, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

At Tue, 28 May 2024 20:15:59 +0800 (CST), "Long Song" <songlong88@126.com> wrote in

Hi,
Actually, I still wonder why only the error message
of the last failure to close the file was recorded.
For this unusual situation, it is acceptable to
record all failure information without causing
too much logging.
Was it designed that way on purpose?

Note that SlruReportIOError() causes a non-local exit. To me, the
point of the loop seems to be that we want to close every single file,
apart from the failed ones. From that perspective, the patch disrupts
that intended behavior by exiting in the middle of the loop. It seems
we didn't want to bother collecting errors for every failed file in
that part.

Yeah, thanks for your reminder.
It was my mistake not to notice the ereport() exit in the function.
But is it necessary to record it in a log? If there is a benefit to
logging, I can submit a modified patch and record the necessary
failure information into the log in another way.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Best Regards,

Long

#5Long Song
songlong88@126.com
In reply to: Kyotaro Horiguchi (#3)
1 attachment(s)
Re:Re: [PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

Hi,
I modified the patch, kept the old call of SlruReportIOError()
outside the for-loop, and add a call of erport() with LOG level
when file-close failure occurs in the for-loop.

Please check whether this modification is appropriate
and let me know if there is any problem. Thank you.

At 2024-06-04 14:44:09, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

At Tue, 28 May 2024 20:15:59 +0800 (CST), "Long Song" <songlong88@126.com> wrote in

Hi,
Actually, I still wonder why only the error message
of the last failure to close the file was recorded.
For this unusual situation, it is acceptable to
record all failure information without causing
too much logging.
Was it designed that way on purpose?

Note that SlruReportIOError() causes a non-local exit. To me, the
point of the loop seems to be that we want to close every single file,
apart from the failed ones. From that perspective, the patch disrupts
that intended behavior by exiting in the middle of the loop. It seems
we didn't want to bother collecting errors for every failed file in
that part.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Best Regards,

Long

Attachments:

v2-0001-Fix-error-report-missing-in-SimpleLruWriteAll.patchapplication/octet-stream; name=v2-0001-Fix-error-report-missing-in-SimpleLruWriteAll.patchDownload
From acb1dc39b4242700e6729d575bbf55fea9dfad17 Mon Sep 17 00:00:00 2001
From: vrhappy <songlong88@126.com>
Date: Sat, 25 May 2024 14:17:01 +0800
Subject: [PATCH v2] Fix error-report missing in SimpleLruWriteAll

---
 src/backend/access/transam/slru.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 77b05cc0a..c663f78cc 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1365,6 +1365,15 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 			slru_errno = errno;
 			pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
 			ok = false;
+			// record all the error failing to close file
+			char path[MAXPGPATH];
+			SlruFileName(ctl, path, fdata.segno[i]);
+			errno = slru_errno;
+			ereport(LOG,
+					(errcode_for_file_access(),
+					 errmsg("could not access status of transaction %u", InvalidTransactionId),
+					 errdetail("Could not close file \"%s\": %m.",
+							   path)));
 		}
 	}
 	if (!ok)
-- 
2.43.0.windows.1