[PATCH] - Provide robust alternatives for replace_string

Started by Georgios Kokolatosover 5 years ago13 messageshackers
Jump to latest
#1Georgios Kokolatos
gkokolatos@protonmail.com

Hi,

In our testing framework, backed by pg_regress, there exists the ability to use special strings
that can be replaced by environment based ones. Such an example is '@testtablespace@'. The
function used for this replacement is replace_string which inline replaces these occurrences in
original line. It is documented that the original line buffer should be large enough to accommodate.

However, it is rather possible and easy for subtle errors to occur, especially if there are multiple
occurrences to be replaced in long enough lines. Please find two distinct versions of a possible
solution. One, which is preferred, is using StringInfo though it requires for stringinfo.h to be included
in pg_regress.c. The other patch is more basic and avoids including stringinfo.h. As a reminder
stringinfo became available in the frontend in commit (26aaf97b683d)

Because the original replace_string() is exposed to other users, it is currently left intact.
Also if required, an error can be raised in the original function, in cases that the string is not
long enough to accommodate the replacements.

Worthwhile to mention that currently there are no such issues present in the test suits. It should
not hurt to do a bit better though.

//Asim and Georgios

Attachments:

0001-Use-stringInfo-instead-of-char-in-replace_string.patchapplication/octet-stream; name=0001-Use-stringInfo-instead-of-char-in-replace_string.patchDownload+33-9
0001-Heap-allocated-string-version-of-replace_string.patchapplication/octet-stream; name=0001-Heap-allocated-string-version-of-replace_string.patchDownload+51-8
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Georgios Kokolatos (#1)
Re: [PATCH] - Provide robust alternatives for replace_string

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

If you're asking for opinion, mine is that StringInfo looks to be the
better approach, and also you don't need to keep API compatibility.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#2)
Re: [PATCH] - Provide robust alternatives for replace_string

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);
                size_t          pos = ptr - string->data;

string->len = pos;

If you're asking for opinion, mine is that StringInfo looks to be the
better approach, and also you don't need to keep API compatibility.

Thank you. We also prefer StringInfo solution.

Asim

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Asim Praveen (#3)
Re: [PATCH] - Provide robust alternatives for replace_string

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
while ((ptr = strstr(string->data, replace)) != NULL)
{
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);

I was thinking pnstrdup(string->data, ptr - string->data) to avoid
copying the chars beyond ptr.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#4)
Re: [PATCH] - Provide robust alternatives for replace_string

On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

Thanks for the patient explanation, I had missed the obvious. To keep the code simple, I’m in favour of relying on the diff of a failing test to catch the split-replacement string problem.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

That’s a good suggestion. Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
while ((ptr = strstr(string->data, replace)) != NULL)
{
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);

I was thinking pnstrdup(string->data, ptr - string->data) to avoid
copying the chars beyond ptr.

In fact, what we need in the dup are chars beyond ptr. Copying of characters prefixing the string to be replaced can be avoided, like so:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,12 +465,12 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+               char       *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
                size_t          pos = ptr - string->data;
                string->len = pos;
                appendStringInfoString(string, replacement);
-               appendStringInfoString(string, dup + pos + strlen(replace));
+               appendStringInfoString(string, suffix);
-               free(dup);
+               free(suffix);
        }
}

Asim

#6Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#4)
Re: [PATCH] - Provide robust alternatives for replace_string

On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-03, Asim Praveen wrote:

Thank you Alvaro for reviewing the patch!

On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

What happens if a replacement string happens to be split in the middle
by the fgets buffering? I think it'll fail to be replaced. This
applies to both versions.

Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example. Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

Please find attached a StringInfo based solution to this problem. It uses fgetln instead of fgets such that a line is read in full, without ever splitting it.

Asim

Attachments:

0001-Use-a-stringInfo-instead-of-a-char-for-replace_strin.patchapplication/octet-stream; name=0001-Use-a-stringInfo-instead-of-a-char-for-replace_strin.patchDownload+37-9
#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Asim Praveen (#6)
Re: [PATCH] - Provide robust alternatives for replace_string

On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Asim Praveen
pasim@vmware.com
In reply to: Alvaro Herrera (#7)
Re: [PATCH] - Provide robust alternatives for replace_string

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.

Please find a revised version that uses fgetc instead.

Asim

Attachments:

v2-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v2-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload+44-9
#9Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Asim Praveen (#8)
Re: [PATCH] - Provide robust alternatives for replace_string

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 7 August 2020 09:02, Asim Praveen <pasim@vmware.com> wrote:

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera alvherre@2ndquadrant.com wrote:
On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.

Please find a revised version that uses fgetc instead.

Although not an issue in the current branch, fgetc might become a bit slow
in large files. Please find v3 which simply continues reading the line if
fgets fills the buffer and there is still data to read.

Also this version, implements Alvaro's suggestion to break API compatibility.

To that extent, ecpg regress has been slightly modified to use the new version
of replace_string() where needed, or remove it all together where possible.

//Georgios

Show quoted text

Asim

Attachments:

v3-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v3-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload+68-50
#10Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Georgios Kokolatos (#9)
Re: [PATCH] - Provide robust alternatives for replace_string

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, 19 August 2020 11:07, Georgios <gkokolatos@protonmail.com> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 7 August 2020 09:02, Asim Praveen pasim@vmware.com wrote:

On 05-Aug-2020, at 7:01 PM, Alvaro Herrera alvherre@2ndquadrant.com wrote:
On 2020-Aug-05, Asim Praveen wrote:

Please find attached a StringInfo based solution to this problem. It
uses fgetln instead of fgets such that a line is read in full, without
ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS. Are you planning to add something to
src/common for it?

Indeed! I noticed fgetln on the man page of fgets and used it without checking. And this happened on a MacOS system.
Please find a revised version that uses fgetc instead.

Although not an issue in the current branch, fgetc might become a bit slow
in large files. Please find v3 which simply continues reading the line if
fgets fills the buffer and there is still data to read.

Also this version, implements Alvaro's suggestion to break API compatibility.

To that extent, ecpg regress has been slightly modified to use the new version
of replace_string() where needed, or remove it all together where possible.

I noticed that the cfbot [1]https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.105985 was unhappy with the raw use of __attribute__ on windows builds.

In retrospect it is rather obvious it would complain. Please find v4 attached.

//Georgios

//Georgios

Asim

[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.105985

Attachments:

v4-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchapplication/octet-stream; name=v4-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patchDownload+68-50
#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Georgios Kokolatos (#10)
Re: [PATCH] - Provide robust alternatives for replace_string

Note that starting with commit 67a472d71c98 you can use pg_get_line and
not worry about the hard part of this anymore :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#11)
Re: [PATCH] - Provide robust alternatives for replace_string

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Note that starting with commit 67a472d71c98 you can use pg_get_line and
not worry about the hard part of this anymore :-)

pg_get_line as it stands isn't quite suitable, because it just hands
back a "char *" string, not a StringInfo that you can do further
processing on.

However, I'd already grown a bit dissatisfied with exposing only that
API, because the code 8f8154a50 added to hba.c couldn't use pg_get_line
either, and had to duplicate the logic. So the attached revised patch
splits pg_get_line into two pieces, one with the existing char * API
and one that appends to a caller-provided StringInfo. (hba.c needs the
append-rather-than-reset behavior, and it might be useful elsewhere
too.)

While here, I couldn't resist getting rid of ecpg_filter()'s hard-wired
line length limit too.

This version looks committable to me, though perhaps someone has
further thoughts?

regards, tom lane

Attachments:

v5-0001-use-stringinfo-for-replace_string.patchtext/x-diff; charset=us-ascii; name=v5-0001-use-stringinfo-for-replace_string.patchDownload+137-103
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: [PATCH] - Provide robust alternatives for replace_string

I wrote:

This version looks committable to me, though perhaps someone has
further thoughts?

I looked through this again and pushed it.

regards, tom lane