Re: Don't choke on files that are removed while pg_rewind runs.

Started by Justin Pryzbyover 5 years ago8 messages
#1Justin Pryzby
pryzby@telsasoft.com

commit b36805f3c54fe0e50e58bb9e6dad66daca46fbf6
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun Jun 28 21:35:51 2015 +0300

...

|@@ -175,22 +175,31 @@ libpqProcessFileList(void)
| pg_fatal("unexpected result set while fetching file list\n");
|
| /* Read result to local variables */
| for (i = 0; i < PQntuples(res); i++)
| {
| char *path = PQgetvalue(res, i, 0);
| int filesize = atoi(PQgetvalue(res, i, 1));
| bool isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
| char *link_target = PQgetvalue(res, i, 3);
| file_type_t type;
|
|+ if (PQgetisnull(res, 0, 1))
...
|+ continue;

Every other access to "res" in this loop is to res(i), which I believe is what
was intended here, too. Currently, it will dumbly loop but skip *every* row if
the 2nd column (1: size) of the first row (0) is null.

--
Justin

#2Daniel Gustafsson
daniel@yesql.se
In reply to: Justin Pryzby (#1)

On 13 Jul 2020, at 08:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

Every other access to "res" in this loop is to res(i), which I believe is what
was intended here, too. Currently, it will dumbly loop but skip *every* row if
the 2nd column (1: size) of the first row (0) is null.

Yeah, I agree with that, seems like the call should've been PQgetisnull(res, i, 1);
to match the loop.

cheers ./daniel

#3Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Daniel Gustafsson (#2)

On Mon, 13 Jul 2020 at 15:34, Daniel Gustafsson <daniel@yesql.se> wrote:

On 13 Jul 2020, at 08:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

Every other access to "res" in this loop is to res(i), which I believe is what
was intended here, too. Currently, it will dumbly loop but skip *every* row if
the 2nd column (1: size) of the first row (0) is null.

Yeah, I agree with that, seems like the call should've been PQgetisnull(res, i, 1);
to match the loop.

+1

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#3)
1 attachment(s)

On Mon, Jul 13, 2020 at 03:59:56PM +0900, Masahiko Sawada wrote:

On Mon, 13 Jul 2020 at 15:34, Daniel Gustafsson <daniel@yesql.se> wrote:

Yeah, I agree with that, seems like the call should've been PQgetisnull(res, i, 1);
to match the loop.

+1

Good catch, Justin. There is a second thing here. The second column
matches with the file size, so if its value is NULL then atol() would
just crash first. I think that it would be more simple to first check
if the file size is NULL (isdir and link_target would be also NULL,
but just checking for the file size is fine), and then assign the
result values, like in the attached. Any thoughts?
--
Michael

Attachments:

libpq-null-crash.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 1dbbceab0b..c44648f823 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -214,13 +214,13 @@ libpqProcessFileList(void)
 	/* Read result to local variables */
 	for (i = 0; i < PQntuples(res); i++)
 	{
-		char	   *path = PQgetvalue(res, i, 0);
-		int64		filesize = atol(PQgetvalue(res, i, 1));
-		bool		isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
-		char	   *link_target = PQgetvalue(res, i, 3);
+		char	   *path;
+		int64		filesize;
+		bool		isdir;
+		char	   *link_target;
 		file_type_t type;
 
-		if (PQgetisnull(res, 0, 1))
+		if (PQgetisnull(res, i, 1))
 		{
 			/*
 			 * The file was removed from the server while the query was
@@ -229,6 +229,11 @@ libpqProcessFileList(void)
 			continue;
 		}
 
+		path = PQgetvalue(res, i, 0);
+		filesize = atol(PQgetvalue(res, i, 1));
+		isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
+		link_target = PQgetvalue(res, i, 3);
+
 		if (link_target[0])
 			type = FILE_TYPE_SYMLINK;
 		else if (isdir)
#5Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#4)

On 13 Jul 2020, at 09:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 13, 2020 at 03:59:56PM +0900, Masahiko Sawada wrote:

On Mon, 13 Jul 2020 at 15:34, Daniel Gustafsson <daniel@yesql.se> wrote:

Yeah, I agree with that, seems like the call should've been PQgetisnull(res, i, 1);
to match the loop.

+1

Good catch, Justin. There is a second thing here. The second column
matches with the file size, so if its value is NULL then atol() would
just crash first.

Does it? PGgetvalue will return an empty string and not NULL, so atol will
convert that to zero wont it? It can be argued whether zero is the right size
for a missing file, but it shouldn't crash at least.

I think that it would be more simple to first check
if the file size is NULL (isdir and link_target would be also NULL,
but just checking for the file size is fine), and then assign the
result values, like in the attached. Any thoughts?

It does convey the meaning of code to do it after, since the data isn't useful
in case the filesize is zero, but I don't have strong feelings for/against.
Question is, rather than discard rows pulled from the server, should the query
be tweaked to not include it in the first place instead?

cheers ./daniel

#6Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#5)

On Mon, Jul 13, 2020 at 10:12:54AM +0200, Daniel Gustafsson wrote:

Does it? PGgetvalue will return an empty string and not NULL, so atol will
convert that to zero wont it? It can be argued whether zero is the right size
for a missing file, but it shouldn't crash at least.

Nay, you are right. Thanks.

It does convey the meaning of code to do it after, since the data isn't useful
in case the filesize is zero, but I don't have strong feelings for/against.
Question is, rather than discard rows pulled from the server, should the query
be tweaked to not include it in the first place instead?

That sounds like a good idea with an extra qual in the first part of
the inner CTE, if coupled with a check to make sure that we never
get a NULL result. Now there is IMO an argument to not complicate
more this query as it is not like a lot of tuples would get filtered
out anyway because of a NULL set of values? I don't have strong
feelings for one approach or the other, but if I were to choose, I
would just let the code as-is, without the change in the CTE.
--
Michael

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#6)

On 13 Jul 2020, at 14:18, Michael Paquier <michael@paquier.xyz> wrote:

That sounds like a good idea with an extra qual in the first part of
the inner CTE, if coupled with a check to make sure that we never
get a NULL result. Now there is IMO an argument to not complicate
more this query as it is not like a lot of tuples would get filtered
out anyway because of a NULL set of values? I don't have strong
feelings for one approach or the other, but if I were to choose, I
would just let the code as-is, without the change in the CTE.

I don't have strong opinions either, both approaches will work, so feel free to
go ahead with the proposed change.

cheers ./daniel

#8Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#7)

On Tue, Jul 14, 2020 at 12:18:41PM +0200, Daniel Gustafsson wrote:

I don't have strong opinions either, both approaches will work, so feel free to
go ahead with the proposed change.

Thanks. I have just gone with the solution to not change the query,
and applied it down to 9.5. Please note that I also maintain an older
version for 9.4 and 9.3, that has the same problem:
https://github.com/vmware/pg_rewind

So I'll go fix it.
--
Michael