improving basebackup.c's file-reading code

Started by Robert Haasover 5 years ago5 messages
#1Robert Haas
robertmhaas@gmail.com
1 attachment(s)

Hi,

basebackup.c's code to read from files uses fread() and friends. This
is not great, because it's not documented to set errno. See commit
286af0ce12117bc673b97df6228d1a666594d247 and related discussion. It
seems like a better idea would be to use pg_pgread(), which not only
does set errno, but also lets us eliminate a bit of code that uses
fseek().

There are a couple of other things here that can also be improved. One
is that it seems like a good idea to set a wait event while doing I/O
here, as we do elsewhere. Another is that it seems like a good idea to
report short reads in a non-confusing, non-wrong sort of way. I here
adopted the convention previously mentioned in
/messages/by-id/20200128020303.GA1552@paquier.xyz

Patch, for v14, attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v1-0001-Improve-server-code-to-read-files-as-part-of-a-ba.patchapplication/octet-stream; name=v1-0001-Improve-server-code-to-read-files-as-part-of-a-ba.patchDownload
From 48da23e5a719dac972fe8cc0a1c7153d7e54ab19 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 27 Apr 2020 12:04:04 -0400
Subject: [PATCH v1] Improve server code to read files as part of a base
 backup.

Don't use fread(), since that doesn't necessarily set errno. We could
use read() instead, but it's even better to use pg_pread(), which
allows us to avoid some extra calls to seek to the desired location in
the file.

Also, advertise a wait event while reading from a file, as we do for
most other places where we're reading data from files.
---
 doc/src/sgml/monitoring.sgml         |   7 +-
 src/backend/postmaster/pgstat.c      |   3 +
 src/backend/replication/basebackup.c | 143 ++++++++++++++-------------
 src/include/pgstat.h                 |   3 +-
 4 files changed, 88 insertions(+), 68 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6562cc400b..1f193592a5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1536,7 +1536,12 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting in a cost-based vacuum delay point.</entry>
         </row>
         <row>
-         <entry morerows="68"><literal>IO</literal></entry>
+         <entry morerows="69"><literal>IO</literal></entry>
+         <entry><literal>BaseBackupRead</literal></entry>
+         <entry>Waiting for base backup to read from a file.</entry>
+        </row>
+        <row>
+         <entry><literal>IO</literal></entry>
          <entry><literal>BufFileRead</literal></entry>
          <entry>Waiting for a read from a buffered file.</entry>
         </row>
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 50eea2e8a8..89cf8fbc84 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3926,6 +3926,9 @@ pgstat_get_wait_io(WaitEventIO w)
 
 	switch (w)
 	{
+		case WAIT_EVENT_BASEBACKUP_READ:
+			event_name = "BaseBackupRead";
+			break;
 		case WAIT_EVENT_BUFFILE_READ:
 			event_name = "BufFileRead";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbdc28ec39..ee1060ab8b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -79,6 +79,8 @@ static int	compareWalFileNames(const ListCell *a, const ListCell *b);
 static void throttle(size_t increment);
 static void update_basebackup_progress(int64 delta);
 static bool is_checksummed_file(const char *fullpath, const char *filename);
+static int	basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
+								 const char *filename, bool partial_read_ok);
 
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
@@ -96,18 +98,6 @@ static char *statrelpath = NULL;
  */
 #define THROTTLING_FREQUENCY	8
 
-/*
- * Checks whether we encountered any error in fread().  fread() doesn't give
- * any clue what has happened, so we check with ferror().  Also, neither
- * fread() nor ferror() set errno, so we just throw a generic error.
- */
-#define CHECK_FREAD_ERROR(fp, filename) \
-do { \
-	if (ferror(fp)) \
-		ereport(ERROR, \
-				(errmsg("could not read from file \"%s\"", filename))); \
-} while (0)
-
 /* The actual number of bytes, transfer of which may cause sleep. */
 static uint64 throttling_sample;
 
@@ -595,7 +585,7 @@ perform_base_backup(basebackup_options *opt)
 		foreach(lc, walFileList)
 		{
 			char	   *walFileName = (char *) lfirst(lc);
-			FILE	   *fp;
+			int			fd;
 			char		buf[TAR_SEND_SIZE];
 			size_t		cnt;
 			pgoff_t		len = 0;
@@ -603,8 +593,8 @@ perform_base_backup(basebackup_options *opt)
 			snprintf(pathbuf, MAXPGPATH, XLOGDIR "/%s", walFileName);
 			XLogFromFileName(walFileName, &tli, &segno, wal_segment_size);
 
-			fp = AllocateFile(pathbuf, "rb");
-			if (fp == NULL)
+			fd = OpenTransientFile(pathbuf, O_RDONLY | PG_BINARY);
+			if (fd < 0)
 			{
 				int			save_errno = errno;
 
@@ -621,7 +611,7 @@ perform_base_backup(basebackup_options *opt)
 						 errmsg("could not open file \"%s\": %m", pathbuf)));
 			}
 
-			if (fstat(fileno(fp), &statbuf) != 0)
+			if (fstat(fd, &statbuf) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not stat file \"%s\": %m",
@@ -637,9 +627,10 @@ perform_base_backup(basebackup_options *opt)
 			/* send the WAL file itself */
 			_tarWriteHeader(pathbuf, NULL, &statbuf, false);
 
-			while ((cnt = fread(buf, 1,
-								Min(sizeof(buf), wal_segment_size - len),
-								fp)) > 0)
+			while ((cnt = basebackup_read_file(fd, buf,
+											   Min(sizeof(buf),
+												   wal_segment_size - len),
+											   len, pathbuf, true)) > 0)
 			{
 				CheckXLogRemoved(segno, tli);
 				/* Send the chunk as a CopyData message */
@@ -655,8 +646,6 @@ perform_base_backup(basebackup_options *opt)
 					break;
 			}
 
-			CHECK_FREAD_ERROR(fp, pathbuf);
-
 			if (len != wal_segment_size)
 			{
 				CheckXLogRemoved(segno, tli);
@@ -667,7 +656,7 @@ perform_base_backup(basebackup_options *opt)
 
 			/* wal_segment_size is a multiple of 512, so no need for padding */
 
-			FreeFile(fp);
+			CloseTransientFile(fd);
 
 			/*
 			 * Mark file as archived, otherwise files can get archived again
@@ -1561,7 +1550,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 		 struct stat *statbuf, bool missing_ok, Oid dboid,
 		 backup_manifest_info *manifest, const char *spcoid)
 {
-	FILE	   *fp;
+	int			fd;
 	BlockNumber blkno = 0;
 	bool		block_retry = false;
 	char		buf[TAR_SEND_SIZE];
@@ -1580,8 +1569,8 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	pg_checksum_init(&checksum_ctx, manifest->checksum_type);
 
-	fp = AllocateFile(readfilename, "rb");
-	if (fp == NULL)
+	fd = OpenTransientFile(readfilename, O_RDONLY | PG_BINARY);
+	if (fd < 0)
 	{
 		if (errno == ENOENT && missing_ok)
 			return false;
@@ -1623,8 +1612,27 @@ sendFile(const char *readfilename, const char *tarfilename,
 		}
 	}
 
-	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
+	/*
+	 * Loop until we read at least the amount of data the caller told us to
+	 * expect. The file could be longer, if it was extended while we were
+	 * sending it, but for a base backup we can ignore such extended data. It
+	 * will be restored from WAL.
+	 */
+	while (len < statbuf->st_size)
 	{
+		/* Try to read some more data. */
+		cnt = basebackup_read_file(fd, buf,
+								   Min(sizeof(buf), statbuf->st_size - len),
+								   len, readfilename, true);
+
+		/*
+		 * If we hit end-of-file, a concurrent truncation must have occurred.
+		 * That's not an error condition, because WAL replay will fix things
+		 * up.
+		 */
+		if (cnt == 0)
+			break;
+
 		/*
 		 * The checksums are verified at block level, so we iterate over the
 		 * buffer in chunks of BLCKSZ, after making sure that
@@ -1675,16 +1683,15 @@ sendFile(const char *readfilename, const char *tarfilename,
 						 */
 						if (block_retry == false)
 						{
-							/* Reread the failed block */
-							if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
-							{
-								ereport(ERROR,
-										(errcode_for_file_access(),
-										 errmsg("could not fseek in file \"%s\": %m",
-												readfilename)));
-							}
+							int			reread_cnt;
 
-							if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
+							/* Reread the failed block */
+							reread_cnt =
+								basebackup_read_file(fd, buf + BLCKSZ * i,
+													 BLCKSZ, len + BLCKSZ * i,
+													 readfilename,
+													 false);
+							if (reread_cnt == 0)
 							{
 								/*
 								 * If we hit end-of-file, a concurrent
@@ -1694,24 +1701,8 @@ sendFile(const char *readfilename, const char *tarfilename,
 								 * code that handles that case. (We must fix
 								 * up cnt first, though.)
 								 */
-								if (feof(fp))
-								{
-									cnt = BLCKSZ * i;
-									break;
-								}
-
-								ereport(ERROR,
-										(errcode_for_file_access(),
-										 errmsg("could not reread block %d of file \"%s\": %m",
-												blkno, readfilename)));
-							}
-
-							if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1)
-							{
-								ereport(ERROR,
-										(errcode_for_file_access(),
-										 errmsg("could not fseek in file \"%s\": %m",
-												readfilename)));
+								cnt = BLCKSZ * i;
+								break;
 							}
 
 							/* Set flag so we know a retry was attempted */
@@ -1754,20 +1745,8 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 		len += cnt;
 		throttle(cnt);
-
-		if (feof(fp) || len >= statbuf->st_size)
-		{
-			/*
-			 * Reached end of file. The file could be longer, if it was
-			 * extended while we were sending it, but for a base backup we can
-			 * ignore such extended data. It will be restored from WAL.
-			 */
-			break;
-		}
 	}
 
-	CHECK_FREAD_ERROR(fp, readfilename);
-
 	/* If the file was truncated while we were sending it, pad it with zeros */
 	if (len < statbuf->st_size)
 	{
@@ -1796,7 +1775,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 		update_basebackup_progress(pad);
 	}
 
-	FreeFile(fp);
+	CloseTransientFile(fd);
 
 	if (checksum_failures > 1)
 	{
@@ -1982,3 +1961,35 @@ update_basebackup_progress(int64 delta)
 
 	pgstat_progress_update_multi_param(nparam, index, val);
 }
+
+/*
+ * Read some data from a file, setting a wait event and reporting any error
+ * encountered.
+ *
+ * If partial_read_ok is false, also report an error if the number of bytes
+ * read is not equal to the number of bytes requested.
+ *
+ * Returns the number of bytes read.
+ */
+static int
+basebackup_read_file(int fd, char *buf, size_t nbytes, off_t offset,
+					 const char *filename, bool partial_read_ok)
+{
+	int			rc;
+
+	pgstat_report_wait_start(WAIT_EVENT_BASEBACKUP_READ);
+	rc = pg_pread(fd, buf, nbytes, offset);
+	pgstat_report_wait_end();
+
+	if (rc < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m", filename)));
+	if (!partial_read_ok && rc > 0 && rc != nbytes)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": read %d of %zu",
+						filename, rc, nbytes)));
+
+	return rc;
+}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b8041d9988..3ae40d305a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -913,7 +913,8 @@ typedef enum
  */
 typedef enum
 {
-	WAIT_EVENT_BUFFILE_READ = PG_WAIT_IO,
+	WAIT_EVENT_BASEBACKUP_READ = PG_WAIT_IO,
+	WAIT_EVENT_BUFFILE_READ,
 	WAIT_EVENT_BUFFILE_WRITE,
 	WAIT_EVENT_CONTROL_FILE_READ,
 	WAIT_EVENT_CONTROL_FILE_SYNC,
-- 
2.24.2 (Apple Git-127)

#2Hamid Akhtar
hamid.akhtar@gmail.com
In reply to: Robert Haas (#1)
Re: improving basebackup.c's file-reading code

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

The idea and the patch looks good to me.

It makes sense to change fread to basebackup_read_file which internally calls pg_pread which is a portable function as opposed to read. As you've mentioned, this is much better for error handling.

I guess it is more of a personal choice, but I would suggest making the while conditions consistent as well. The while loop in the patch @ line 121 conditions over return value of "basebackup_read_file" whereas @ line 177, it has a condition "(len < statbuf->st_size)".

The new status of this patch is: Ready for Committer

#3Robert Haas
robertmhaas@gmail.com
In reply to: Hamid Akhtar (#2)
Re: improving basebackup.c's file-reading code

On Wed, Apr 29, 2020 at 5:52 AM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:

The idea and the patch looks good to me.

It makes sense to change fread to basebackup_read_file which internally calls pg_pread which is a portable function as opposed to read. As you've mentioned, this is much better for error handling.

Thanks for the review. I have now committed the patch, after rebasing
and adjusting one comment slightly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Robert Haas (#3)
Re: improving basebackup.c's file-reading code

On 17 Jun 2020, at 17:45, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 29, 2020 at 5:52 AM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:

The idea and the patch looks good to me.

It makes sense to change fread to basebackup_read_file which internally calls pg_pread which is a portable function as opposed to read. As you've mentioned, this is much better for error handling.

Thanks for the review. I have now committed the patch, after rebasing
and adjusting one comment slightly.

As this went in, can we close the 2020-07 CF entry or is there anything left in
the patchseries?

cheers ./daniel

#5Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Gustafsson (#4)
Re: improving basebackup.c's file-reading code

On Thu, Jun 25, 2020 at 10:29 AM Daniel Gustafsson <daniel@yesql.se> wrote:

As this went in, can we close the 2020-07 CF entry or is there anything left in
the patchseries?

Done. Thanks for the reminder.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company