Use array as object (src/fe_utils/parallel_slot.c)

Started by Ranier Vilelaover 3 years ago9 messages
#1Ranier Vilela
ranier.vf@gmail.com
1 attachment(s)

Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.

cancelconn wouldn't that be the correct argument?

regards,
Ranier Vilela

Attachments:

001-fix-cancel-conn.patchapplication/octet-stream; name=001-fix-cancel-conn.patchDownload
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index 684327885d..4bf053697a 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -237,7 +237,7 @@ wait_on_slots(ParallelSlotArray *sa)
 	if (cancelconn == NULL)
 		return false;
 
-	SetCancelConn(sa->slots->connection);
+	SetCancelConn(cancelconn);
 	i = select_loop(maxFd, &slotset);
 	ResetCancelConn();
 
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#1)
Re: Use array as object (src/fe_utils/parallel_slot.c)

Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.

The commit
https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163
Introduced the affected function.
I'm not sure you're having problems, but using arrays as single pointer is
not recommended.

regards,
Ranier Vilela

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Ranier Vilela (#2)
Re: Use array as object (src/fe_utils/parallel_slot.c)

On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote:

Em qui., 11 de ago. de 2022 �s 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:

Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.

The commit
https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163
Introduced the affected function.

It's true that the function was added there, but SetCancelConn() was called the
same way before that: SetCancelConn(slots->connection);

If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.

I agree that it would be clearer if this were written as slots[0].connection.

--
Justin

#4Ranier Vilela
ranier.vf@gmail.com
In reply to: Justin Pryzby (#3)
Re: Use array as object (src/fe_utils/parallel_slot.c)

Em sex., 19 de ago. de 2022 às 16:22, Justin Pryzby <pryzby@telsasoft.com>
escreveu:

On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote:

Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com>

escreveu:

Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be

suspicious.

The commit

https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163

Introduced the affected function.

It's true that the function was added there, but SetCancelConn() was
called the
same way before that: SetCancelConn(slots->connection);

If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.

Thank you Justin, for the research.

I agree that it would be clearer if this were written as
slots[0].connection.

But I still think that the new variable introduced, "cancelconn", became
the real argument.

regards,
Ranier Vilela

#5Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#3)
Re: Use array as object (src/fe_utils/parallel_slot.c)

On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote:

If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.

I agree that it would be clearer if this were written as slots[0].connection.

Based on the way the code is written on HEAD, this would be the
correct assumption. Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET. So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.
--
Michael

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Michael Paquier (#5)
Re: Use array as object (src/fe_utils/parallel_slot.c)

Em dom., 21 de ago. de 2022 às 21:15, Michael Paquier <michael@paquier.xyz>
escreveu:

On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote:

If you trace the history back to a17923204, you'll see a comment about

the

"zeroth slot", which makes it clear that the first slot it what's

intended.

I agree that it would be clearer if this were written as

slots[0].connection.

Based on the way the code is written on HEAD, this would be the
correct assumption. Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET. So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.

Thanks Michael, for looking at this.
Is it worth creating a commiffest?

regards,
Ranier Vilela

#7Michael Paquier
michael@paquier.xyz
In reply to: Ranier Vilela (#6)
Re: Use array as object (src/fe_utils/parallel_slot.c)

On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote:

Is it worth creating a commiffest?

Don't think so, but feel free to create one and mark me as committer
if you think that's appropriate. I have marked this thread as
something to do soon-ishly, but I am being distracted by life this
month.
--
Michael

#8Ranier Vilela
ranier.vf@gmail.com
In reply to: Michael Paquier (#7)
Re: Use array as object (src/fe_utils/parallel_slot.c)

Em sáb., 27 de ago. de 2022 às 00:00, Michael Paquier <michael@paquier.xyz>
escreveu:

On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote:

Is it worth creating a commiffest?

Don't think so, but feel free to create one and mark me as committer
if you think that's appropriate. I have marked this thread as
something to do soon-ishly

Hi Michael, I see the commit.
Thanks for the hardest part.
Suspecting something wrong is easy, the difficult thing is to describe why
it is wrong.

, but I am being distracted by life this

month.

Glad to know, enjoy.

regards,
Ranier Vilela

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#5)
Re: Use array as object (src/fe_utils/parallel_slot.c)

Michael Paquier <michael@paquier.xyz> writes:

Based on the way the code is written on HEAD, this would be the
correct assumption. Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET. So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.

I came across this commit (52144b6fc) while writing release notes,
and I have to seriously question whether it's right yet. The thing
that needs to be asked is, if we get a SIGINT in a program using this
logic, why would we propagate a cancel to just one of the controlled
sessions and not all of them?

It looks to me like the original concept was that slot zero would be
a "master" connection, such that canceling just that one would have a
useful effect. Maybe the current users of parallel_slot.c still use
it like that, but I bet it's more likely that the connections are
all doing independent things and you really gotta cancel them all
if you want out.

I suppose maybe this commit improved matters: if you are running N jobs
then typing control-C N times (not too quickly) might eventually get
you out, by successively canceling the lowest-numbered surviving
connection. Previously you could have pounded the key all day and
not gotten rid of any but the zero'th task. OTOH, if the connections
don't exit but just go back to idle, which seems pretty possible,
then it's not clear we've moved the needle at all.

Anyway I think this needs rewritten, not just tweaked. The cancel.c
infrastructure is really next to useless here since it is only designed
with one connection in mind. I'm inclined to think we should only
expect the signal handler to set CancelRequested, and then manually
issue cancels to all live connections when we see that become set.

I'm not proposing reverting 52144b6fc, because I doubt it made
anything worse; but I'm thinking of leaving it out of the release
notes, because I'm unsure it had any user-visible effect at all.
It doesn't look to me like we'd ever get to wait_on_slots unless
all the connections are known busy.

regards, tom lane