[PATCH] - Provide robust alternatives for replace_string

Started by Georgiosover 5 years ago13 messages
#1Georgios
gkokolatos@protonmail.com
2 attachment(s)

Hi,

In our testing framework, backed by pg_regress, there exists the ability to use special strings
that can be replaced by environment based ones. Such an example is '@testtablespace@'. The
function used for this replacement is replace_string which inline replaces these occurrences in
original line. It is documented that the original line buffer should be large enough to accommodate.

However, it is rather possible and easy for subtle errors to occur, especially if there are multiple
occurrences to be replaced in long enough lines. Please find two distinct versions of a possible
solution. One, which is preferred, is using StringInfo though it requires for stringinfo.h to be included
in pg_regress.c. The other patch is more basic and avoids including stringinfo.h. As a reminder
stringinfo became available in the frontend in commit (26aaf97b683d)

Because the original replace_string() is exposed to other users, it is currently left intact.
Also if required, an error can be raised in the original function, in cases that the string is not
long enough to accommodate the replacements.

Worthwhile to mention that currently there are no such issues present in the test suits. It should
not hurt to do a bit better though.

//Asim and Georgios

Attachments:

0001-Use-stringInfo-instead-of-char-in-replace_string.patchapplication/octet-stream; name=0001-Use-stringInfo-instead-of-char-in-replace_string.patchDownload
From 66fdc9b006797592c41de6cc1619f96104260e9d Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@protonmail.com>
Date: Fri, 31 Jul 2020 09:10:36 +0000
Subject: [PATCH] Use a stringInfo instead of a char for replace_string in
 pg_regress

The exposed function replase_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Provide an alternative function that operates on a StringInfo instead.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/test/regress/pg_regress.c | 41 ++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..78b5057069 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -33,6 +33,7 @@
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -454,6 +455,27 @@ replace_string(char *string, const char *replace, const char *replacement)
 	}
 }
 
+/*
+ * Replace all occurrences of a string in a stringInfo with a different string.
+ */
+static void
+replace_stringInfo(StringInfo string, const char *replace, const char *replacement)
+{
+	char	   *ptr;
+
+	while ((ptr = strstr(string->data, replace)) != NULL)
+	{
+		char	   *dup = pg_strdup(string->data);
+		size_t		pos = ptr - string->data;
+
+		string->len = pos;
+		appendStringInfoString(string, replacement);
+		appendStringInfoString(string, dup + pos + strlen(replace));
+
+		free(dup);
+	}
+}
+
 /*
  * Convert *.source found in the "source" directory, replacing certain tokens
  * in the file contents with their intended values, and put the resulting files
@@ -521,7 +543,7 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		StringInfoData	line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +573,18 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+		while (fgets(line.data, line.maxlen, infile))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			replace_stringInfo(&line, "@abs_srcdir@", inputdir);
+			replace_stringInfo(&line, "@abs_builddir@", outputdir);
+			replace_stringInfo(&line, "@testtablespace@", testtablespace);
+			replace_stringInfo(&line, "@libdir@", dlpath);
+			replace_stringInfo(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
 		}
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
-- 
2.25.1

0001-Heap-allocated-string-version-of-replace_string.patchapplication/octet-stream; name=0001-Heap-allocated-string-version-of-replace_string.patchDownload
From 903127e896428a76d2c93445fadd960b5c9f477b Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@pivotal.io>
Date: Fri, 31 Jul 2020 08:17:02 +0000
Subject: [PATCH] Add and use a heap allocated string version of replace_string
 in pg_regress

The exposed function replase_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Provide an alternative function that operates on a heap allocated string instead.
The string regrows if needed and the new size can be used for subsequent reads.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/test/regress/pg_regress.c | 58 ++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..5701c80e42 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -434,6 +434,40 @@ string_matches_pattern(const char *str, const char *pattern)
 	return false;
 }
 
+/*
+ * Replace all occurrences of a string in a string with a different string.
+ *
+ * The string grows if there is not enough space.
+ */
+static char *
+replace_string_heap(char *string, size_t *stringlen,
+					const char *replace, const char *replacement)
+{
+	ssize_t		overflowlen;
+	char	   *ptr;
+
+	overflowlen = strlen(replacement) - strlen(replace);
+	while ((ptr = strstr(string, replace)) != NULL)
+	{
+		char	   *dup = pg_strdup(string);
+
+		if (overflowlen > 0 &&
+			((ptr + strlen(ptr) + overflowlen) > (string + *stringlen - 1)))
+		{
+			*stringlen *= 2;
+			string = realloc(string, *stringlen);
+			ptr = strstr(string, replace);
+		}
+
+		strlcpy(string, dup, ptr - string + 1);
+		strcat(string, replacement);
+		strcat(string, dup + (ptr - string) + strlen(replace));
+		free(dup);
+	}
+
+	return string;
+}
+
 /*
  * Replace all occurrences of a string in a string with a different string.
  * NOTE: Assumes there is enough room in the target buffer!
@@ -521,7 +555,8 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		char	   *line;
+		size_t		linelen;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +586,24 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		linelen = 1024;
+		line = malloc(linelen);
+		while (fgets(line, linelen, infile))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
+			line = replace_string_heap(line, &linelen,
+									  "@abs_srcdir@", inputdir);
+			line = replace_string_heap(line, &linelen,
+									  "@abs_builddir@", outputdir);
+			line = replace_string_heap(line, &linelen,
+									  "@testtablespace@", testtablespace);
+			line = replace_string_heap(line, &linelen,
+									  "@libdir@", dlpath);
+			line = replace_string_heap(line, &linelen,
+									  "@DLSUFFIX@", DLSUFFIX);
 			fputs(line, outfile);
 		}
+		free(line);
 		fclose(infile);
 		fclose(outfile);
 	}
-- 
2.25.1

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Georgios (#1)
Re: [PATCH] - Provide robust alternatives for replace_string

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

If you're asking for opinion, mine is that StringInfo looks to be the
better approach, and also you don't need to keep API compatibility.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#2)
Re: [PATCH] - Provide robust alternatives for replace_string

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);
                size_t          pos = ptr - string->data;

string->len = pos;

If you're asking for opinion, mine is that StringInfo looks to be the
better approach, and also you don't need to keep API compatibility.

Thank you. We also prefer StringInfo solution.

Asim

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Asim Praveen (#3)
Re: [PATCH] - Provide robust alternatives for replace_string

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
while ((ptr = strstr(string->data, replace)) != NULL)
{
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);

I was thinking pnstrdup(string->data, ptr - string->data) to avoid
copying the chars beyond ptr.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#4)
Re: [PATCH] - Provide robust alternatives for replace_string

On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

Thanks for the patient explanation, I had missed the obvious. To keep the code simple, I’m in favour of relying on the diff of a failing test to catch the split-replacement string problem.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
while ((ptr = strstr(string->data, replace)) != NULL)
{
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);

I was thinking pnstrdup(string->data, ptr - string->data) to avoid
copying the chars beyond ptr.

In fact, what we need in the dup are chars beyond ptr. Copying of characters prefixing the string to be replaced can be avoided, like so:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,12 +465,12 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+               char       *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
                size_t          pos = ptr - string->data;
                string->len = pos;
                appendStringInfoString(string, replacement);
-               appendStringInfoString(string, dup + pos + strlen(replace));
+               appendStringInfoString(string, suffix);
-               free(dup);
+               free(suffix);
        }
}

Asim

#6Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#4)
1 attachment(s)
Re: [PATCH] - Provide robust alternatives for replace_string

On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

Please find attached a StringInfo based solution to this problem. It uses fgetln instead of fgets such that a line is read in full, without ever splitting it.

Asim

Attachments:

0001-Use-a-stringInfo-instead-of-a-char-for-replace_strin.patchapplication/octet-stream; name=0001-Use-a-stringInfo-instead-of-a-char-for-replace_strin.patchDownload
From 0834fecf59ae3a413af1028a1149a38b73dda822 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@protonmail.com>
Date: Wed, 5 Aug 2020 12:30:53 +0530
Subject: [PATCH] Use a stringInfo instead of a char for replace_string in
 pg_regress

The exposed function replase_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Provide an alternative function that operates on a StringInfo instead.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/test/regress/pg_regress.c | 45 ++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dcf..738069f61c4 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -33,6 +33,7 @@
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -454,6 +455,27 @@ replace_string(char *string, const char *replace, const char *replacement)
 	}
 }
 
+/*
+ * Replace all occurrences of a string in a stringInfo with a different string.
+ */
+static void
+replace_stringInfo(StringInfo string, const char *replace, const char *replacement)
+{
+	char	   *ptr;
+
+	while ((ptr = strstr(string->data, replace)) != NULL)
+	{
+		char	   *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
+		size_t		pos = ptr - string->data;
+
+		string->len = pos;
+		appendStringInfoString(string, replacement);
+		appendStringInfoString(string, suffix);
+
+		free(suffix);
+	}
+}
+
 /*
  * Convert *.source found in the "source" directory, replacing certain tokens
  * in the file contents with their intended values, and put the resulting files
@@ -521,7 +543,9 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		const char *fline;
+		size_t		flinelen;
+		StringInfoData	line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +575,20 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+		while ((fline = fgetln(infile, &flinelen)))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			appendBinaryStringInfo(&line, fline, flinelen);
+			replace_stringInfo(&line, "@abs_srcdir@", inputdir);
+			replace_stringInfo(&line, "@abs_builddir@", outputdir);
+			replace_stringInfo(&line, "@testtablespace@", testtablespace);
+			replace_stringInfo(&line, "@libdir@", dlpath);
+			replace_stringInfo(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
+			resetStringInfo(&line);
 		}
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
-- 
2.24.2 (Apple Git-127)

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Asim Praveen (#6)
Re: [PATCH] - Provide robust alternatives for replace_string

On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#7)
1 attachment(s)
Re: [PATCH] - Provide robust alternatives for replace_string

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.

Please find a revised version that uses fgetc instead.

Asim

Attachments:

v2-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v2-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload
From 47980dc47b81ff2453465ac17f2816f522481405 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@protonmail.com>
Date: Fri, 7 Aug 2020 11:16:53 +0530
Subject: [PATCH v2] Use a stringInfo instead of a char for replace_string in
 pg_regress

The exposed function replase_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Provide an alternative function that operates on a StringInfo instead.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/test/regress/pg_regress.c | 52 +++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dcf..3fe620ef476 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -33,6 +33,7 @@
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -454,6 +455,27 @@ replace_string(char *string, const char *replace, const char *replacement)
 	}
 }
 
+/*
+ * Replace all occurrences of a string in a stringInfo with a different string.
+ */
+static void
+replace_stringInfo(StringInfo string, const char *replace, const char *replacement)
+{
+	char	   *ptr;
+
+	while ((ptr = strstr(string->data, replace)) != NULL)
+	{
+		char	   *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
+		size_t		pos = ptr - string->data;
+
+		string->len = pos;
+		appendStringInfoString(string, replacement);
+		appendStringInfoString(string, suffix);
+
+		free(suffix);
+	}
+}
+
 /*
  * Convert *.source found in the "source" directory, replacing certain tokens
  * in the file contents with their intended values, and put the resulting files
@@ -521,7 +543,7 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		StringInfoData	line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +573,29 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+
+		while (!feof(infile))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			/* Read one line in StringInfo */
+			char c;
+			while ((c = fgetc(infile)) != EOF)
+			{
+				appendStringInfoChar(&line, c);
+				if (c == '\n')
+					break;
+			}
+
+			replace_stringInfo(&line, "@abs_srcdir@", inputdir);
+			replace_stringInfo(&line, "@abs_builddir@", outputdir);
+			replace_stringInfo(&line, "@testtablespace@", testtablespace);
+			replace_stringInfo(&line, "@libdir@", dlpath);
+			replace_stringInfo(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
+			resetStringInfo(&line);
 		}
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
-- 
2.24.2 (Apple Git-127)

#9Georgios
gkokolatos@protonmail.com
In reply to: Asim Praveen (#8)
1 attachment(s)
Re: [PATCH] - Provide robust alternatives for replace_string

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 7 August 2020 09:02, Asim Praveen <pasim@vmware.com> wrote:

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera alvherre@2ndquadrant.com wrote:
On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.

Please find a revised version that uses fgetc instead.

Although not an issue in the current branch, fgetc might become a bit slow
in large files. Please find v3 which simply continues reading the line if
fgets fills the buffer and there is still data to read.

Also this version, implements Alvaro's suggestion to break API compatibility.

To that extent, ecpg regress has been slightly modified to use the new version
of replace_string() where needed, or remove it all together where possible.

//Georgios

Show quoted text

Asim

Attachments:

v3-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v3-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload
From 1af926e336b63f5648986a368eb4491dd4c7afa5 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@pivotal.io>
Date: Tue, 18 Aug 2020 12:20:35 +0000
Subject: [PATCH v3] Use a stringInfo instead of a char for replace_string in
 pg_regress

The exposed function replace_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Now replace_string() operates in StringInfo instead, growing the string as
nessecary.

Also, pg_regress takes advantage of the growing capability of StringInfo and
consumes lines greater than the original buffer, guarding against cutting the
replace string along consecutive reads.

Refactor ecpg's pg_regress slightly to make use of the new replace_string or
remove it all together where applicable.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/interfaces/ecpg/test/pg_regress_ecpg.c | 53 +++++++++----------
 src/test/regress/pg_regress.c              | 59 +++++++++++++++-------
 src/test/regress/pg_regress.h              |  5 +-
 3 files changed, 68 insertions(+), 49 deletions(-)

diff --git a/src/interfaces/ecpg/test/pg_regress_ecpg.c b/src/interfaces/ecpg/test/pg_regress_ecpg.c
index 46b9e78fe5..d6aa9c16d8 100644
--- a/src/interfaces/ecpg/test/pg_regress_ecpg.c
+++ b/src/interfaces/ecpg/test/pg_regress_ecpg.c
@@ -19,6 +19,7 @@
 #include "postgres_fe.h"
 
 #include "pg_regress.h"
+#include "lib/stringinfo.h"
 
 #define LINEBUFSIZE 300
 
@@ -52,7 +53,6 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 		if (strstr(linebuf, "#line ") == linebuf)
 		{
 			char	   *p = strchr(linebuf, '"');
-			char	   *n;
 			int			plen = 1;
 
 			while (*p && (*(p + plen) == '.' || strchr(p + plen, '/') != NULL))
@@ -62,9 +62,7 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 			/* plen is one more than the number of . and / characters */
 			if (plen > 1)
 			{
-				n = (char *) malloc(plen);
-				strlcpy(n, p + 1, plen);
-				replace_string(linebuf, n, "");
+				memmove(p + 1, p + plen, strlen(p + plen) + 1);
 			}
 		}
 		fputs(linebuf, t);
@@ -84,43 +82,43 @@ ecpg_start_test(const char *testname,
 				_stringlist **expectfiles,
 				_stringlist **tags)
 {
+	StringInfoData	testname_dash;
 	PID_TYPE	pid;
 	char		inprg[MAXPGPATH];
 	char		insource[MAXPGPATH];
-	char	   *outfile_stdout,
+	char		outfile_stdout[MAXPGPATH],
 				expectfile_stdout[MAXPGPATH];
-	char	   *outfile_stderr,
+	char		outfile_stderr[MAXPGPATH],
 				expectfile_stderr[MAXPGPATH];
-	char	   *outfile_source,
+	char		outfile_source[MAXPGPATH],
 				expectfile_source[MAXPGPATH];
 	char		cmd[MAXPGPATH * 3];
-	char	   *testname_dash;
 	char	   *appnameenv;
 
 	snprintf(inprg, sizeof(inprg), "%s/%s", inputdir, testname);
 
-	testname_dash = strdup(testname);
-	replace_string(testname_dash, "/", "-");
+	initStringInfo(&testname_dash);
+	appendStringInfoString(&testname_dash, testname);
+	replace_string(&testname_dash, "/", "-");
 	snprintf(expectfile_stdout, sizeof(expectfile_stdout),
 			 "%s/expected/%s.stdout",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_stderr, sizeof(expectfile_stderr),
 			 "%s/expected/%s.stderr",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_source, sizeof(expectfile_source),
 			 "%s/expected/%s.c",
-			 outputdir, testname_dash);
-
-	/*
-	 * We can use replace_string() here because the replacement string does
-	 * not occupy more space than the replaced one.
-	 */
-	outfile_stdout = strdup(expectfile_stdout);
-	replace_string(outfile_stdout, "/expected/", "/results/");
-	outfile_stderr = strdup(expectfile_stderr);
-	replace_string(outfile_stderr, "/expected/", "/results/");
-	outfile_source = strdup(expectfile_source);
-	replace_string(outfile_source, "/expected/", "/results/");
+			 outputdir, testname_dash.data);
+
+	snprintf(outfile_stdout, sizeof(outfile_stdout),
+			 "%s/results/%s.stdout",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_stderr, sizeof(outfile_stderr),
+			 "%s/results/%s.stderr",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_source, sizeof(outfile_source),
+			 "%s/results/%s.c",
+			 outputdir, testname_dash.data);
 
 	add_stringlist_item(resultfiles, outfile_stdout);
 	add_stringlist_item(expectfiles, expectfile_stdout);
@@ -145,7 +143,7 @@ ecpg_start_test(const char *testname,
 			 outfile_stdout,
 			 outfile_stderr);
 
-	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash);
+	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash.data);
 	putenv(appnameenv);
 
 	pid = spawn_process(cmd);
@@ -160,10 +158,7 @@ ecpg_start_test(const char *testname,
 	unsetenv("PGAPPNAME");
 	free(appnameenv);
 
-	free(testname_dash);
-	free(outfile_stdout);
-	free(outfile_stderr);
-	free(outfile_source);
+	free(testname_dash.data);
 
 	return pid;
 }
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..ec6d63f91a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -33,6 +33,7 @@
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -434,23 +435,22 @@ string_matches_pattern(const char *str, const char *pattern)
 	return false;
 }
 
-/*
- * Replace all occurrences of a string in a string with a different string.
- * NOTE: Assumes there is enough room in the target buffer!
- */
+
 void
-replace_string(char *string, const char *replace, const char *replacement)
+replace_string(StringInfo string, const char *replace, const char *replacement)
 {
 	char	   *ptr;
 
-	while ((ptr = strstr(string, replace)) != NULL)
+	while ((ptr = strstr(string->data, replace)) != NULL)
 	{
-		char	   *dup = pg_strdup(string);
+		char	   *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
+		size_t		pos = ptr - string->data;
 
-		strlcpy(string, dup, ptr - string + 1);
-		strcat(string, replacement);
-		strcat(string, dup + (ptr - string) + strlen(replace));
-		free(dup);
+		string->len = pos;
+		appendStringInfoString(string, replacement);
+		appendStringInfoString(string, suffix);
+
+		free(suffix);
 	}
 }
 
@@ -521,7 +521,7 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		StringInfoData	line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +551,36 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+		while (fgets(line.data, line.maxlen, infile))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			/* continue reading if line was cut short and there is more input */
+			if ((strlen(line.data) == line.maxlen - 1) &&
+				line.data[line.maxlen - 2] != '\n')
+			{
+				char	rest[1024];
+				char   *unused __attribute__((unused));
+
+				line.len = line.maxlen - 1;
+				do
+				{
+					unused = fgets(rest, sizeof(rest), infile);
+					appendStringInfoString(&line, rest);
+				} while ((strlen(rest) == sizeof(rest) - 1) &&
+						 rest[sizeof(rest) - 2] != '\n');
+			}
+
+			replace_string(&line, "@abs_srcdir@", inputdir);
+			replace_string(&line, "@abs_builddir@", outputdir);
+			replace_string(&line, "@testtablespace@", testtablespace);
+			replace_string(&line, "@libdir@", dlpath);
+			replace_string(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
+
+			resetStringInfo(&line);
 		}
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
diff --git a/src/test/regress/pg_regress.h b/src/test/regress/pg_regress.h
index ee6e0d42f4..d4640e0e2f 100644
--- a/src/test/regress/pg_regress.h
+++ b/src/test/regress/pg_regress.h
@@ -18,6 +18,8 @@
 #define INVALID_PID INVALID_HANDLE_VALUE
 #endif
 
+struct StringInfoData;		/* not to include stringinfo.h here */
+
 /* simple list of strings */
 typedef struct _stringlist
 {
@@ -49,5 +51,6 @@ int			regression_main(int argc, char *argv[],
 							init_function ifunc, test_function tfunc);
 void		add_stringlist_item(_stringlist **listhead, const char *str);
 PID_TYPE	spawn_process(const char *cmdline);
-void		replace_string(char *string, const char *replace, const char *replacement);
+void		replace_string(struct StringInfoData *string,
+						   const char *replace, const char *replacement);
 bool		file_exists(const char *file);
-- 
2.25.1

#10Georgios
gkokolatos@protonmail.com
In reply to: Georgios (#9)
1 attachment(s)
Re: [PATCH] - Provide robust alternatives for replace_string

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, 19 August 2020 11:07, Georgios <gkokolatos@protonmail.com> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 7 August 2020 09:02, Asim Praveen pasim@vmware.com wrote:

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera alvherre@2ndquadrant.com wrote:
On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.
Please find a revised version that uses fgetc instead.

Although not an issue in the current branch, fgetc might become a bit slow
in large files. Please find v3 which simply continues reading the line if
fgets fills the buffer and there is still data to read.

Also this version, implements Alvaro's suggestion to break API compatibility.

To that extent, ecpg regress has been slightly modified to use the new version
of replace_string() where needed, or remove it all together where possible.

I noticed that the cfbot [1]https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.105985 was unhappy with the raw use of __attribute__ on windows builds.

In retrospect it is rather obvious it would complain. Please find v4 attached.

//Georgios

//Georgios

Asim

[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.105985

Attachments:

v4-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v4-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload
From 95567f599fb64e7fc72e243286d917ca3bd9e5d0 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos <gkokolatos@pivotal.io>
Date: Mon, 31 Aug 2020 08:51:36 +0000
Subject: [PATCH v4] Use a stringInfo instead of a char for replace_string in
 pg_regress

The exposed function replace_string() in pg_regress assumes that there exists
enough space in the string it operates on. While this is documented and
expected, the consequence can be subtle failures in the tests which might be
hard to trace back to the root cause.

Now replace_string() operates in StringInfo instead, growing the string as
nessecary.

Also, pg_regress takes advantage of the growing capability of StringInfo and
consumes lines greater than the original buffer, guarding against cutting the
replace string along consecutive reads.

Refactor ecpg's pg_regress slightly to make use of the new replace_string or
remove it all together where applicable.

Co-authored-by: Asim R P <pasim@vmware.com>
---
 src/interfaces/ecpg/test/pg_regress_ecpg.c | 53 +++++++++----------
 src/test/regress/pg_regress.c              | 59 +++++++++++++++-------
 src/test/regress/pg_regress.h              |  5 +-
 3 files changed, 68 insertions(+), 49 deletions(-)

diff --git a/src/interfaces/ecpg/test/pg_regress_ecpg.c b/src/interfaces/ecpg/test/pg_regress_ecpg.c
index 46b9e78fe5..d6aa9c16d8 100644
--- a/src/interfaces/ecpg/test/pg_regress_ecpg.c
+++ b/src/interfaces/ecpg/test/pg_regress_ecpg.c
@@ -19,6 +19,7 @@
 #include "postgres_fe.h"
 
 #include "pg_regress.h"
+#include "lib/stringinfo.h"
 
 #define LINEBUFSIZE 300
 
@@ -52,7 +53,6 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 		if (strstr(linebuf, "#line ") == linebuf)
 		{
 			char	   *p = strchr(linebuf, '"');
-			char	   *n;
 			int			plen = 1;
 
 			while (*p && (*(p + plen) == '.' || strchr(p + plen, '/') != NULL))
@@ -62,9 +62,7 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 			/* plen is one more than the number of . and / characters */
 			if (plen > 1)
 			{
-				n = (char *) malloc(plen);
-				strlcpy(n, p + 1, plen);
-				replace_string(linebuf, n, "");
+				memmove(p + 1, p + plen, strlen(p + plen) + 1);
 			}
 		}
 		fputs(linebuf, t);
@@ -84,43 +82,43 @@ ecpg_start_test(const char *testname,
 				_stringlist **expectfiles,
 				_stringlist **tags)
 {
+	StringInfoData	testname_dash;
 	PID_TYPE	pid;
 	char		inprg[MAXPGPATH];
 	char		insource[MAXPGPATH];
-	char	   *outfile_stdout,
+	char		outfile_stdout[MAXPGPATH],
 				expectfile_stdout[MAXPGPATH];
-	char	   *outfile_stderr,
+	char		outfile_stderr[MAXPGPATH],
 				expectfile_stderr[MAXPGPATH];
-	char	   *outfile_source,
+	char		outfile_source[MAXPGPATH],
 				expectfile_source[MAXPGPATH];
 	char		cmd[MAXPGPATH * 3];
-	char	   *testname_dash;
 	char	   *appnameenv;
 
 	snprintf(inprg, sizeof(inprg), "%s/%s", inputdir, testname);
 
-	testname_dash = strdup(testname);
-	replace_string(testname_dash, "/", "-");
+	initStringInfo(&testname_dash);
+	appendStringInfoString(&testname_dash, testname);
+	replace_string(&testname_dash, "/", "-");
 	snprintf(expectfile_stdout, sizeof(expectfile_stdout),
 			 "%s/expected/%s.stdout",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_stderr, sizeof(expectfile_stderr),
 			 "%s/expected/%s.stderr",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_source, sizeof(expectfile_source),
 			 "%s/expected/%s.c",
-			 outputdir, testname_dash);
-
-	/*
-	 * We can use replace_string() here because the replacement string does
-	 * not occupy more space than the replaced one.
-	 */
-	outfile_stdout = strdup(expectfile_stdout);
-	replace_string(outfile_stdout, "/expected/", "/results/");
-	outfile_stderr = strdup(expectfile_stderr);
-	replace_string(outfile_stderr, "/expected/", "/results/");
-	outfile_source = strdup(expectfile_source);
-	replace_string(outfile_source, "/expected/", "/results/");
+			 outputdir, testname_dash.data);
+
+	snprintf(outfile_stdout, sizeof(outfile_stdout),
+			 "%s/results/%s.stdout",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_stderr, sizeof(outfile_stderr),
+			 "%s/results/%s.stderr",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_source, sizeof(outfile_source),
+			 "%s/results/%s.c",
+			 outputdir, testname_dash.data);
 
 	add_stringlist_item(resultfiles, outfile_stdout);
 	add_stringlist_item(expectfiles, expectfile_stdout);
@@ -145,7 +143,7 @@ ecpg_start_test(const char *testname,
 			 outfile_stdout,
 			 outfile_stderr);
 
-	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash);
+	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash.data);
 	putenv(appnameenv);
 
 	pid = spawn_process(cmd);
@@ -160,10 +158,7 @@ ecpg_start_test(const char *testname,
 	unsetenv("PGAPPNAME");
 	free(appnameenv);
 
-	free(testname_dash);
-	free(outfile_stdout);
-	free(outfile_stderr);
-	free(outfile_source);
+	free(testname_dash.data);
 
 	return pid;
 }
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..185dc20c00 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -33,6 +33,7 @@
 #include "common/restricted_token.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -434,23 +435,22 @@ string_matches_pattern(const char *str, const char *pattern)
 	return false;
 }
 
-/*
- * Replace all occurrences of a string in a string with a different string.
- * NOTE: Assumes there is enough room in the target buffer!
- */
+
 void
-replace_string(char *string, const char *replace, const char *replacement)
+replace_string(StringInfo string, const char *replace, const char *replacement)
 {
 	char	   *ptr;
 
-	while ((ptr = strstr(string, replace)) != NULL)
+	while ((ptr = strstr(string->data, replace)) != NULL)
 	{
-		char	   *dup = pg_strdup(string);
+		char	   *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
+		size_t		pos = ptr - string->data;
 
-		strlcpy(string, dup, ptr - string + 1);
-		strcat(string, replacement);
-		strcat(string, dup + (ptr - string) + strlen(replace));
-		free(dup);
+		string->len = pos;
+		appendStringInfoString(string, replacement);
+		appendStringInfoString(string, suffix);
+
+		free(suffix);
 	}
 }
 
@@ -521,7 +521,7 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		StringInfoData	line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +551,36 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+		while (fgets(line.data, line.maxlen, infile))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			/* continue reading if line was cut short and there is more input */
+			if ((strlen(line.data) == line.maxlen - 1) &&
+				line.data[line.maxlen - 2] != '\n')
+			{
+				char	rest[1024];
+				char   *unused pg_attribute_unused();
+
+				line.len = line.maxlen - 1;
+				do
+				{
+					unused = fgets(rest, sizeof(rest), infile);
+					appendStringInfoString(&line, rest);
+				} while ((strlen(rest) == sizeof(rest) - 1) &&
+						 rest[sizeof(rest) - 2] != '\n');
+			}
+
+			replace_string(&line, "@abs_srcdir@", inputdir);
+			replace_string(&line, "@abs_builddir@", outputdir);
+			replace_string(&line, "@testtablespace@", testtablespace);
+			replace_string(&line, "@libdir@", dlpath);
+			replace_string(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
+
+			resetStringInfo(&line);
 		}
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
diff --git a/src/test/regress/pg_regress.h b/src/test/regress/pg_regress.h
index ee6e0d42f4..d4640e0e2f 100644
--- a/src/test/regress/pg_regress.h
+++ b/src/test/regress/pg_regress.h
@@ -18,6 +18,8 @@
 #define INVALID_PID INVALID_HANDLE_VALUE
 #endif
 
+struct StringInfoData;		/* not to include stringinfo.h here */
+
 /* simple list of strings */
 typedef struct _stringlist
 {
@@ -49,5 +51,6 @@ int			regression_main(int argc, char *argv[],
 							init_function ifunc, test_function tfunc);
 void		add_stringlist_item(_stringlist **listhead, const char *str);
 PID_TYPE	spawn_process(const char *cmdline);
-void		replace_string(char *string, const char *replace, const char *replacement);
+void		replace_string(struct StringInfoData *string,
+						   const char *replace, const char *replacement);
 bool		file_exists(const char *file);
-- 
2.25.1

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Georgios (#10)
Re: [PATCH] - Provide robust alternatives for replace_string

Note that starting with commit 67a472d71c98 you can use pg_get_line and
not worry about the hard part of this anymore :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#11)
1 attachment(s)
Re: [PATCH] - Provide robust alternatives for replace_string

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Note that starting with commit 67a472d71c98 you can use pg_get_line and
not worry about the hard part of this anymore :-)

pg_get_line as it stands isn't quite suitable, because it just hands
back a "char *" string, not a StringInfo that you can do further
processing on.

However, I'd already grown a bit dissatisfied with exposing only that
API, because the code 8f8154a50 added to hba.c couldn't use pg_get_line
either, and had to duplicate the logic. So the attached revised patch
splits pg_get_line into two pieces, one with the existing char * API
and one that appends to a caller-provided StringInfo. (hba.c needs the
append-rather-than-reset behavior, and it might be useful elsewhere
too.)

While here, I couldn't resist getting rid of ecpg_filter()'s hard-wired
line length limit too.

This version looks committable to me, though perhaps someone has
further thoughts?

regards, tom lane

Attachments:

v5-0001-use-stringinfo-for-replace_string.patchtext/x-diff; charset=us-ascii; name=v5-0001-use-stringinfo-for-replace_string.patchDownload
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 5991a21cf2..9f106653f3 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -502,33 +502,8 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 		/* Collect the next input line, handling backslash continuations */
 		resetStringInfo(&buf);
 
-		while (!feof(file) && !ferror(file))
+		while (pg_get_line_append(file, &buf))
 		{
-			/* Make sure there's a reasonable amount of room in the buffer */
-			enlargeStringInfo(&buf, 128);
-
-			/* Read some data, appending it to what we already have */
-			if (fgets(buf.data + buf.len, buf.maxlen - buf.len, file) == NULL)
-			{
-				int			save_errno = errno;
-
-				if (!ferror(file))
-					break;		/* normal EOF */
-				/* I/O error! */
-				ereport(elevel,
-						(errcode_for_file_access(),
-						 errmsg("could not read file \"%s\": %m", filename)));
-				err_msg = psprintf("could not read file \"%s\": %s",
-								   filename, strerror(save_errno));
-				resetStringInfo(&buf);
-				break;
-			}
-			buf.len += strlen(buf.data + buf.len);
-
-			/* If we haven't got a whole line, loop to read more */
-			if (!(buf.len > 0 && buf.data[buf.len - 1] == '\n'))
-				continue;
-
 			/* Strip trailing newline, including \r in case we're on Windows */
 			buf.len = pg_strip_crlf(buf.data);
 
@@ -551,6 +526,19 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 			break;
 		}
 
+		if (ferror(file))
+		{
+			/* I/O error! */
+			int			save_errno = errno;
+
+			ereport(elevel,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m", filename)));
+			err_msg = psprintf("could not read file \"%s\": %s",
+							   filename, strerror(save_errno));
+			break;
+		}
+
 		/* Parse fields */
 		lineptr = buf.data;
 		while (*lineptr && err_msg == NULL)
diff --git a/src/common/pg_get_line.c b/src/common/pg_get_line.c
index 38433675d4..eae02b9596 100644
--- a/src/common/pg_get_line.c
+++ b/src/common/pg_get_line.c
@@ -49,21 +49,7 @@ pg_get_line(FILE *stream)
 
 	initStringInfo(&buf);
 
-	/* Read some data, appending it to whatever we already have */
-	while (fgets(buf.data + buf.len, buf.maxlen - buf.len, stream) != NULL)
-	{
-		buf.len += strlen(buf.data + buf.len);
-
-		/* Done if we have collected a newline */
-		if (buf.len > 0 && buf.data[buf.len - 1] == '\n')
-			return buf.data;
-
-		/* Make some more room in the buffer, and loop to read more data */
-		enlargeStringInfo(&buf, 128);
-	}
-
-	/* Did fgets() fail because of an I/O error? */
-	if (ferror(stream))
+	if (!pg_get_line_append(stream, &buf))
 	{
 		/* ensure that free() doesn't mess up errno */
 		int			save_errno = errno;
@@ -73,13 +59,49 @@ pg_get_line(FILE *stream)
 		return NULL;
 	}
 
-	/* If we read no data before reaching EOF, we should return NULL */
-	if (buf.len == 0)
+	return buf.data;
+}
+
+/*
+ * pg_get_line_append()
+ *
+ * This has similar behavior to pg_get_line(), and thence to fgets(),
+ * except that the collected data is appended to whatever is in *buf.
+ *
+ * Returns true if a line was successfully collected (including the
+ * case of a non-newline-terminated line at EOF).  Returns false if
+ * there was an I/O error or no data was available before EOF.
+ * (Check ferror(stream) to distinguish these cases.)
+ *
+ * In the false-result case, the contents of *buf are logically unmodified,
+ * though it's possible that the buffer has been resized.
+ */
+bool
+pg_get_line_append(FILE *stream, StringInfo buf)
+{
+	int			orig_len = buf->len;
+
+	/* Read some data, appending it to whatever we already have */
+	while (fgets(buf->data + buf->len, buf->maxlen - buf->len, stream) != NULL)
 	{
-		pfree(buf.data);
-		return NULL;
+		buf->len += strlen(buf->data + buf->len);
+
+		/* Done if we have collected a newline */
+		if (buf->len > orig_len && buf->data[buf->len - 1] == '\n')
+			return true;
+
+		/* Make some more room in the buffer, and loop to read more data */
+		enlargeStringInfo(buf, 128);
 	}
 
-	/* No newline at EOF ... so return what we have */
-	return buf.data;
+	/* Check for I/O errors and EOF */
+	if (ferror(stream) || buf->len == orig_len)
+	{
+		/* Discard any data we collected before detecting error */
+		buf->len = orig_len;
+		return false;
+	}
+
+	/* No newline at EOF, but we did collect some data */
+	return true;
 }
diff --git a/src/include/common/string.h b/src/include/common/string.h
index 18aa1dc5aa..50c241a811 100644
--- a/src/include/common/string.h
+++ b/src/include/common/string.h
@@ -10,6 +10,8 @@
 #ifndef COMMON_STRING_H
 #define COMMON_STRING_H
 
+struct StringInfoData;			/* avoid including stringinfo.h here */
+
 /* functions in src/common/string.c */
 extern bool pg_str_endswith(const char *str, const char *end);
 extern int	strtoint(const char *pg_restrict str, char **pg_restrict endptr,
@@ -19,6 +21,7 @@ extern int	pg_strip_crlf(char *str);
 
 /* functions in src/common/pg_get_line.c */
 extern char *pg_get_line(FILE *stream);
+extern bool pg_get_line_append(FILE *stream, struct StringInfoData *buf);
 
 /* functions in src/common/sprompt.c */
 extern char *simple_prompt(const char *prompt, bool echo);
diff --git a/src/interfaces/ecpg/test/pg_regress_ecpg.c b/src/interfaces/ecpg/test/pg_regress_ecpg.c
index 46b9e78fe5..a2d7b70d9a 100644
--- a/src/interfaces/ecpg/test/pg_regress_ecpg.c
+++ b/src/interfaces/ecpg/test/pg_regress_ecpg.c
@@ -19,8 +19,9 @@
 #include "postgres_fe.h"
 
 #include "pg_regress.h"
+#include "common/string.h"
+#include "lib/stringinfo.h"
 
-#define LINEBUFSIZE 300
 
 static void
 ecpg_filter(const char *sourcefile, const char *outfile)
@@ -31,7 +32,7 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 	 */
 	FILE	   *s,
 			   *t;
-	char		linebuf[LINEBUFSIZE];
+	StringInfoData linebuf;
 
 	s = fopen(sourcefile, "r");
 	if (!s)
@@ -46,13 +47,14 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 		exit(2);
 	}
 
-	while (fgets(linebuf, LINEBUFSIZE, s))
+	initStringInfo(&linebuf);
+
+	while (pg_get_line_append(s, &linebuf))
 	{
 		/* check for "#line " in the beginning */
-		if (strstr(linebuf, "#line ") == linebuf)
+		if (strstr(linebuf.data, "#line ") == linebuf.data)
 		{
-			char	   *p = strchr(linebuf, '"');
-			char	   *n;
+			char	   *p = strchr(linebuf.data, '"');
 			int			plen = 1;
 
 			while (*p && (*(p + plen) == '.' || strchr(p + plen, '/') != NULL))
@@ -62,13 +64,15 @@ ecpg_filter(const char *sourcefile, const char *outfile)
 			/* plen is one more than the number of . and / characters */
 			if (plen > 1)
 			{
-				n = (char *) malloc(plen);
-				strlcpy(n, p + 1, plen);
-				replace_string(linebuf, n, "");
+				memmove(p + 1, p + plen, strlen(p + plen) + 1);
+				/* we don't bother to fix up linebuf.len */
 			}
 		}
-		fputs(linebuf, t);
+		fputs(linebuf.data, t);
+		resetStringInfo(&linebuf);
 	}
+
+	pfree(linebuf.data);
 	fclose(s);
 	fclose(t);
 }
@@ -87,40 +91,42 @@ ecpg_start_test(const char *testname,
 	PID_TYPE	pid;
 	char		inprg[MAXPGPATH];
 	char		insource[MAXPGPATH];
-	char	   *outfile_stdout,
+	StringInfoData testname_dash;
+	char		outfile_stdout[MAXPGPATH],
 				expectfile_stdout[MAXPGPATH];
-	char	   *outfile_stderr,
+	char		outfile_stderr[MAXPGPATH],
 				expectfile_stderr[MAXPGPATH];
-	char	   *outfile_source,
+	char		outfile_source[MAXPGPATH],
 				expectfile_source[MAXPGPATH];
 	char		cmd[MAXPGPATH * 3];
-	char	   *testname_dash;
 	char	   *appnameenv;
 
 	snprintf(inprg, sizeof(inprg), "%s/%s", inputdir, testname);
+	snprintf(insource, sizeof(insource), "%s.c", testname);
+
+	initStringInfo(&testname_dash);
+	appendStringInfoString(&testname_dash, testname);
+	replace_string(&testname_dash, "/", "-");
 
-	testname_dash = strdup(testname);
-	replace_string(testname_dash, "/", "-");
 	snprintf(expectfile_stdout, sizeof(expectfile_stdout),
 			 "%s/expected/%s.stdout",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_stderr, sizeof(expectfile_stderr),
 			 "%s/expected/%s.stderr",
-			 outputdir, testname_dash);
+			 outputdir, testname_dash.data);
 	snprintf(expectfile_source, sizeof(expectfile_source),
 			 "%s/expected/%s.c",
-			 outputdir, testname_dash);
-
-	/*
-	 * We can use replace_string() here because the replacement string does
-	 * not occupy more space than the replaced one.
-	 */
-	outfile_stdout = strdup(expectfile_stdout);
-	replace_string(outfile_stdout, "/expected/", "/results/");
-	outfile_stderr = strdup(expectfile_stderr);
-	replace_string(outfile_stderr, "/expected/", "/results/");
-	outfile_source = strdup(expectfile_source);
-	replace_string(outfile_source, "/expected/", "/results/");
+			 outputdir, testname_dash.data);
+
+	snprintf(outfile_stdout, sizeof(outfile_stdout),
+			 "%s/results/%s.stdout",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_stderr, sizeof(outfile_stderr),
+			 "%s/results/%s.stderr",
+			 outputdir, testname_dash.data);
+	snprintf(outfile_source, sizeof(outfile_source),
+			 "%s/results/%s.c",
+			 outputdir, testname_dash.data);
 
 	add_stringlist_item(resultfiles, outfile_stdout);
 	add_stringlist_item(expectfiles, expectfile_stdout);
@@ -134,18 +140,15 @@ ecpg_start_test(const char *testname,
 	add_stringlist_item(expectfiles, expectfile_source);
 	add_stringlist_item(tags, "source");
 
-	snprintf(insource, sizeof(insource), "%s.c", testname);
 	ecpg_filter(insource, outfile_source);
 
-	snprintf(inprg, sizeof(inprg), "%s/%s", inputdir, testname);
-
 	snprintf(cmd, sizeof(cmd),
 			 "\"%s\" >\"%s\" 2>\"%s\"",
 			 inprg,
 			 outfile_stdout,
 			 outfile_stderr);
 
-	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash);
+	appnameenv = psprintf("PGAPPNAME=ecpg/%s", testname_dash.data);
 	putenv(appnameenv);
 
 	pid = spawn_process(cmd);
@@ -160,10 +163,7 @@ ecpg_start_test(const char *testname,
 	unsetenv("PGAPPNAME");
 	free(appnameenv);
 
-	free(testname_dash);
-	free(outfile_stdout);
-	free(outfile_stderr);
-	free(outfile_source);
+	free(testname_dash.data);
 
 	return pid;
 }
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index d82e0189dc..d83442f467 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -31,8 +31,10 @@
 
 #include "common/logging.h"
 #include "common/restricted_token.h"
+#include "common/string.h"
 #include "common/username.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
 #include "pg_regress.h"
@@ -435,22 +437,32 @@ string_matches_pattern(const char *str, const char *pattern)
 }
 
 /*
- * Replace all occurrences of a string in a string with a different string.
- * NOTE: Assumes there is enough room in the target buffer!
+ * Replace all occurrences of "replace" in "string" with "replacement".
+ * The StringInfo will be suitably enlarged if necessary.
+ *
+ * Note: this is optimized on the assumption that most calls will find
+ * no more than one occurrence of "replace", and quite likely none.
  */
 void
-replace_string(char *string, const char *replace, const char *replacement)
+replace_string(StringInfo string, const char *replace, const char *replacement)
 {
+	int			pos = 0;
 	char	   *ptr;
 
-	while ((ptr = strstr(string, replace)) != NULL)
+	while ((ptr = strstr(string->data + pos, replace)) != NULL)
 	{
-		char	   *dup = pg_strdup(string);
+		/* Must copy the remainder of the string out of the StringInfo */
+		char	   *suffix = pg_strdup(ptr + strlen(replace));
 
-		strlcpy(string, dup, ptr - string + 1);
-		strcat(string, replacement);
-		strcat(string, dup + (ptr - string) + strlen(replace));
-		free(dup);
+		/* Truncate StringInfo at start of found string ... */
+		string->len = ptr - string->data;
+		/* ... and append the replacement */
+		appendStringInfoString(string, replacement);
+		/* Next search should start after the replacement */
+		pos = string->len;
+		/* Put back the remainder of the string */
+		appendStringInfoString(string, suffix);
+		free(suffix);
 	}
 }
 
@@ -521,7 +533,7 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 		char		prefix[MAXPGPATH];
 		FILE	   *infile,
 				   *outfile;
-		char		line[1024];
+		StringInfoData line;
 
 		/* reject filenames not finishing in ".source" */
 		if (strlen(*name) < 8)
@@ -551,15 +563,21 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 					progname, destfile, strerror(errno));
 			exit(2);
 		}
-		while (fgets(line, sizeof(line), infile))
+
+		initStringInfo(&line);
+
+		while (pg_get_line_append(infile, &line))
 		{
-			replace_string(line, "@abs_srcdir@", inputdir);
-			replace_string(line, "@abs_builddir@", outputdir);
-			replace_string(line, "@testtablespace@", testtablespace);
-			replace_string(line, "@libdir@", dlpath);
-			replace_string(line, "@DLSUFFIX@", DLSUFFIX);
-			fputs(line, outfile);
+			replace_string(&line, "@abs_srcdir@", inputdir);
+			replace_string(&line, "@abs_builddir@", outputdir);
+			replace_string(&line, "@testtablespace@", testtablespace);
+			replace_string(&line, "@libdir@", dlpath);
+			replace_string(&line, "@DLSUFFIX@", DLSUFFIX);
+			fputs(line.data, outfile);
+			resetStringInfo(&line);
 		}
+
+		pfree(line.data);
 		fclose(infile);
 		fclose(outfile);
 	}
diff --git a/src/test/regress/pg_regress.h b/src/test/regress/pg_regress.h
index ee6e0d42f4..726f9c9048 100644
--- a/src/test/regress/pg_regress.h
+++ b/src/test/regress/pg_regress.h
@@ -18,6 +18,8 @@
 #define INVALID_PID INVALID_HANDLE_VALUE
 #endif
 
+struct StringInfoData;			/* avoid including stringinfo.h here */
+
 /* simple list of strings */
 typedef struct _stringlist
 {
@@ -49,5 +51,6 @@ int			regression_main(int argc, char *argv[],
 							init_function ifunc, test_function tfunc);
 void		add_stringlist_item(_stringlist **listhead, const char *str);
 PID_TYPE	spawn_process(const char *cmdline);
-void		replace_string(char *string, const char *replace, const char *replacement);
+void		replace_string(struct StringInfoData *string,
+						   const char *replace, const char *replacement);
 bool		file_exists(const char *file);
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: [PATCH] - Provide robust alternatives for replace_string

I wrote:

This version looks committable to me, though perhaps someone has
further thoughts?

I looked through this again and pushed it.

regards, tom lane