psql not responding to SIGINT upon db reconnection

Started by Tristan Partinover 2 years ago43 messageshackers
Jump to latest
#1Tristan Partin
tristan@neon.tech

Neon provides a quick start mechanism for psql using the following
workflow:

$ psql -h pg.neon.tech
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx

Upon navigating to the link, the user selects their database to work
with. The psql process is then connected to a Postgres database at Neon.

psql provides a way to reconnect to the database from within itself: \c.
If, after connecting to a database such as the process previously
mentioned, the user starts a reconnection to the database from within
psql, the process will refuse to interrupt the reconnection attempt
after sending SIGINT to it.

tristan=> \c
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy

^C
^C^C
^C^C

I am not really sure if this was a problem on Windows machines, but the
attached patch should support it if it does. If this was never a problem
on Windows, it might be best to check for any regressions.

This was originally discussed in the a GitHub issue[0]https://github.com/neondatabase/neon/issues/1449 at Neon.

[0]: https://github.com/neondatabase/neon/issues/1449

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v1-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v1-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+19-1
#2Gurjeet Singh
gurjeet@singh.im
In reply to: Tristan Partin (#1)
Re: psql not responding to SIGINT upon db reconnection

On Mon, Jul 24, 2023 at 9:26 AM Tristan Partin <tristan@neon.tech> wrote:

attached patch

+        /*
+         * Restore the default SIGINT behavior while within libpq.
Otherwise, we
+         * can never exit from polling for the database connection. Failure to
+         * restore is non-fatal.
+         */
+        newact.sa_handler = SIG_DFL;
+        rc = sigaction(SIGINT, &newact, &oldact);

There's no action taken if rc != 0. It doesn't seem right to
continue as if everything's fine when the handler registration fails.
At least a warning is warranted, so that the user reports such
failures to the community.

Best regards,
Gurjeet
http://Gurje.et

#3Tristan Partin
tristan@neon.tech
In reply to: Gurjeet Singh (#2)
Re: psql not responding to SIGINT upon db reconnection

On Mon Jul 24, 2023 at 12:00 PM CDT, Gurjeet Singh wrote:

On Mon, Jul 24, 2023 at 9:26 AM Tristan Partin <tristan@neon.tech> wrote:

attached patch

+        /*
+         * Restore the default SIGINT behavior while within libpq.
Otherwise, we
+         * can never exit from polling for the database connection. Failure to
+         * restore is non-fatal.
+         */
+        newact.sa_handler = SIG_DFL;
+        rc = sigaction(SIGINT, &newact, &oldact);

There's no action taken if rc != 0. It doesn't seem right to
continue as if everything's fine when the handler registration fails.
At least a warning is warranted, so that the user reports such
failures to the community.

If we fail to go back to the default handler, then we just get the
behavior we currently have. I am not sure logging a message like "Failed
to restore default SIGINT handler" is that useful to the user. It isn't
actionable, and at the end of the day isn't going to affect them for the
most part. They also aren't even aware that default handler was ever
overridden in the first place. I'm more than happy to add a debug log if
it is the blocker to getting this patch accepted however.

--
Tristan Partin
Neon (https://neon.tech)

#4Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#1)
Re: psql not responding to SIGINT upon db reconnection

v2 is attached which fixes a grammatical issue in a comment.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v2-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v2-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+19-1
#5Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#4)
Re: psql not responding to SIGINT upon db reconnection

v3 is attached which fixes up some code comments I added which I hadn't
attached to the commit already, sigh.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v3-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v3-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+19-1
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#5)
Re: psql not responding to SIGINT upon db reconnection

"Tristan Partin" <tristan@neon.tech> writes:

v3 is attached which fixes up some code comments I added which I hadn't
attached to the commit already, sigh.

I don't care for this patch at all. You're bypassing the pqsignal
abstraction layer that the rest of psql goes through, and the behavior
you're implementing isn't very nice. People do not expect ^C to
kill psql - it should just stop the \c attempt and leave you as you
were.

Admittedly, getting PQconnectdbParams to return control on SIGINT
isn't too practical. But you could probably replace that with a loop
around PQconnectPoll and test for CancelRequested in the loop.

regards, tom lane

#7Tristan Partin
tristan@neon.tech
In reply to: Tom Lane (#6)
Re: psql not responding to SIGINT upon db reconnection

On Mon Jul 24, 2023 at 12:43 PM CDT, Tom Lane wrote:

"Tristan Partin" <tristan@neon.tech> writes:

v3 is attached which fixes up some code comments I added which I hadn't
attached to the commit already, sigh.

I don't care for this patch at all. You're bypassing the pqsignal
abstraction layer that the rest of psql goes through, and the behavior
you're implementing isn't very nice. People do not expect ^C to
kill psql - it should just stop the \c attempt and leave you as you
were.

Admittedly, getting PQconnectdbParams to return control on SIGINT
isn't too practical. But you could probably replace that with a loop
around PQconnectPoll and test for CancelRequested in the loop.

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

$ ./build/src/bin/psql/psql -h pg.neon.tech
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx

NOTICE: Connecting to database.
psql (17devel, server 15.3)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

tristan957=> \c
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy

^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 failed:
Previous connection kept
tristan957=>

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v4-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v4-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+24-2
#8Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Tristan Partin (#7)
Re: psql not responding to SIGINT upon db reconnection

Hi,

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

$ ./build/src/bin/psql/psql -h pg.neon.tech
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx

NOTICE: Connecting to database.
psql (17devel, server 15.3)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

tristan957=> \c
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy

^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 failed:
Previous connection kept
tristan957=>

I went through CFbot and found out that some tests are timing out.
Links:
https://cirrus-ci.com/task/6735437444677632
https://cirrus-ci.com/task/4536414189125632
https://cirrus-ci.com/task/5046587584413696
https://cirrus-ci.com/task/6172487491256320
https://cirrus-ci.com/task/5609537537835008

Some tests which are timing out are as follows:
[00:48:49.243] Summary of Failures:
[00:48:49.243]
[00:48:49.243] 4/270 postgresql:regress / regress/regress TIMEOUT 1000.01s
[00:48:49.243] 6/270 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
TIMEOUT 1000.02s
[00:48:49.243] 34/270 postgresql:recovery /
recovery/027_stream_regress TIMEOUT 1000.02s
[00:48:49.243] 48/270 postgresql:plpgsql / plpgsql/regress TIMEOUT 1000.02s
[00:48:49.243] 49/270 postgresql:plperl / plperl/regress TIMEOUT 1000.01s
[00:48:49.243] 61/270 postgresql:dblink / dblink/regress TIMEOUT 1000.03s
[00:48:49.243] 89/270 postgresql:postgres_fdw / postgres_fdw/regress
TIMEOUT 1000.01s
[00:48:49.243] 93/270 postgresql:test_decoding / test_decoding/regress
TIMEOUT 1000.02s
[00:48:49.243] 110/270 postgresql:test_extensions /
test_extensions/regress TIMEOUT 1000.03s
[00:48:49.243] 123/270 postgresql:unsafe_tests / unsafe_tests/regress
TIMEOUT 1000.02s
[00:48:49.243] 152/270 postgresql:pg_dump / pg_dump/010_dump_connstr
TIMEOUT 1000.03s

Just want to make sure you are aware of the issue.

Thanks
Shlok Kumar Kyal

#9Tristan Partin
tristan@neon.tech
In reply to: Shlok Kyal (#8)
Re: psql not responding to SIGINT upon db reconnection

On Thu Nov 2, 2023 at 4:03 AM CDT, Shlok Kyal wrote:

Hi,

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

$ ./build/src/bin/psql/psql -h pg.neon.tech
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx

NOTICE: Connecting to database.
psql (17devel, server 15.3)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

tristan957=> \c
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy

^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 failed:
Previous connection kept
tristan957=>

I went through CFbot and found out that some tests are timing out.
Links:
https://cirrus-ci.com/task/6735437444677632
https://cirrus-ci.com/task/4536414189125632
https://cirrus-ci.com/task/5046587584413696
https://cirrus-ci.com/task/6172487491256320
https://cirrus-ci.com/task/5609537537835008

Some tests which are timing out are as follows:
[00:48:49.243] Summary of Failures:
[00:48:49.243]
[00:48:49.243] 4/270 postgresql:regress / regress/regress TIMEOUT 1000.01s
[00:48:49.243] 6/270 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
TIMEOUT 1000.02s
[00:48:49.243] 34/270 postgresql:recovery /
recovery/027_stream_regress TIMEOUT 1000.02s
[00:48:49.243] 48/270 postgresql:plpgsql / plpgsql/regress TIMEOUT 1000.02s
[00:48:49.243] 49/270 postgresql:plperl / plperl/regress TIMEOUT 1000.01s
[00:48:49.243] 61/270 postgresql:dblink / dblink/regress TIMEOUT 1000.03s
[00:48:49.243] 89/270 postgresql:postgres_fdw / postgres_fdw/regress
TIMEOUT 1000.01s
[00:48:49.243] 93/270 postgresql:test_decoding / test_decoding/regress
TIMEOUT 1000.02s
[00:48:49.243] 110/270 postgresql:test_extensions /
test_extensions/regress TIMEOUT 1000.03s
[00:48:49.243] 123/270 postgresql:unsafe_tests / unsafe_tests/regress
TIMEOUT 1000.02s
[00:48:49.243] 152/270 postgresql:pg_dump / pg_dump/010_dump_connstr
TIMEOUT 1000.03s

Just want to make sure you are aware of the issue.

Hi Shlok!

Thanks for taking a look. I will check these failures out to see if
I can reproduce.

--
Tristan Partin
Neon (https://neon.tech)

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tristan Partin (#9)
Re: psql not responding to SIGINT upon db reconnection

On 06/11/2023 19:16, Tristan Partin wrote:

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

This patch is missing a select(). It will busy loop until the connection
is established or cancelled.

Shouldn't we also clear CancelRequested after we have cancelled the
attempt? Otherwise, any subsequent attempts will immediately fail too.

Should we use 'cancel_pressed' here rather than CancelRequested? To be
honest, I don't understand the difference, so that's a genuine question.
There was an attempt at unifying them in the past but it was reverted in
commit 5d43c3c54d.

--
Heikki Linnakangas
Neon (https://neon.tech)

#11Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#10)
Re: psql not responding to SIGINT upon db reconnection

On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:

On 06/11/2023 19:16, Tristan Partin wrote:

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

This patch is missing a select(). It will busy loop until the connection
is established or cancelled.

If I add a wait (select, poll, etc.), then I can't control-C during the
blocking call, so it doesn't really solve the problem. On Linux, we have
signalfds which seem like the perfect solution to this problem, "wait on
the pq socket or SIGINT." But that doesn't translate well to other
operating systems :(.

tristan957=> \c
NOTICE: Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/XXXX

^CTerminated

You can see here that I can't terminate the command. Where you see
"Terminated" is me running `kill $(pgrep psql)` in another terminal.

Shouldn't we also clear CancelRequested after we have cancelled the
attempt? Otherwise, any subsequent attempts will immediately fail too.

After switching to cancel_pressed, I don't think so. See this bit of
code in the psql main loop:

/* main loop to get queries and execute them */
while (successResult == EXIT_SUCCESS)
{
/*
* Clean up after a previous Control-C
*/
if (cancel_pressed)
{
if (!pset.cur_cmd_interactive)
{
/*
* You get here if you stopped a script with Ctrl-C.
*/
successResult = EXIT_USER;
break;
}

cancel_pressed = false;
}

Should we use 'cancel_pressed' here rather than CancelRequested? To be
honest, I don't understand the difference, so that's a genuine question.
There was an attempt at unifying them in the past but it was reverted in
commit 5d43c3c54d.

The more I look at this, the more I don't understand... I need to look
at this harder, but wanted to get this email out. Switched to
cancel_pressed though. Thanks for pointing it out. Going to wait to send
a v2 while more discussion occurs.

--
Tristan Partin
Neon (https://neon.tech)

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tristan Partin (#11)
Re: psql not responding to SIGINT upon db reconnection

On 22/11/2023 19:29, Tristan Partin wrote:

On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:

On 06/11/2023 19:16, Tristan Partin wrote:

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

This patch is missing a select(). It will busy loop until the connection
is established or cancelled.

If I add a wait (select, poll, etc.), then I can't control-C during the
blocking call, so it doesn't really solve the problem.

Hmm, they should return with EINTR on signal. At least on Linux; I'm not
sure how portable that is. See signal(7) man page, section "Interruption
of system calls and library functions by signal handlers". You could
also use a timeout like 5 s to ensure that you wake up and notice that
the signal was received eventually, even if it doesn't interrupt the
blocking call.

--
Heikki Linnakangas
Neon (https://neon.tech)

#13Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#12)
Re: psql not responding to SIGINT upon db reconnection

On Wed Nov 22, 2023 at 3:00 PM CST, Heikki Linnakangas wrote:

On 22/11/2023 19:29, Tristan Partin wrote:

On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:

On 06/11/2023 19:16, Tristan Partin wrote:

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.

This patch is missing a select(). It will busy loop until the connection
is established or cancelled.

If I add a wait (select, poll, etc.), then I can't control-C during the
blocking call, so it doesn't really solve the problem.

Hmm, they should return with EINTR on signal. At least on Linux; I'm not
sure how portable that is. See signal(7) man page, section "Interruption
of system calls and library functions by signal handlers". You could
also use a timeout like 5 s to ensure that you wake up and notice that
the signal was received eventually, even if it doesn't interrupt the
blocking call.

Ha, you're right. I had this working yesterday, but convinced myself it
didn't. I had a do while loop wrapping the blocking call. Here is a v4,
which seems to pass the tests that were pointed out to be failing
earlier.

Noticed that I copy-pasted pqSocketPoll() into the psql code. I think
this may be controversial. Not sure what the best solution to the issue
is. I will pay attention to the buildfarm animals when they pick this
up.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v4-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v4-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+128-2
#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tristan Partin (#13)
Re: psql not responding to SIGINT upon db reconnection

On 22/11/2023 23:29, Tristan Partin wrote:

Ha, you're right. I had this working yesterday, but convinced myself it
didn't. I had a do while loop wrapping the blocking call. Here is a v4,
which seems to pass the tests that were pointed out to be failing
earlier.

Thanks! This suffers from a classic race condition:

+			if (cancel_pressed)
+				break;
+
+			sock = PQsocket(n_conn);
+			if (sock == -1)
+				break;
+
+			rc = pqSocketPoll(sock, for_read, !for_read, (time_t)-1);
+			Assert(rc != 0); /* Timeout is impossible. */
+			if (rc == -1)
+			{
+				success = false;
+				break;
+			}

If a signal arrives between the "if (cancel_pressed)" check and
pqSocketPoll(), pgSocketPoll will "miss" the signal and block
indefinitely. There are three solutions to this:

1. Use a timeout, so that you wake up periodically to check for any
missed signals. Easy solution, the downsides are that you will not react
as quickly if the signal is missed, and you waste some cycles by waking
up unnecessarily.

2. The Self-pipe trick: https://cr.yp.to/docs/selfpipe.html. We also use
that in src/backend/storage/ipc/latch.c. It's portable, but somewhat
complicated.

3. Use pselect() or ppoll(). They were created specifically to address
this problem. Not fully portable, I think it's missing on Windows at least.

Maybe use pselect() if it's available, and select() with a timeout if
it's not.

--
Heikki Linnakangas
Neon (https://neon.tech)

#15Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#14)
Re: psql not responding to SIGINT upon db reconnection

On Thu Nov 23, 2023 at 3:19 AM CST, Heikki Linnakangas wrote:

On 22/11/2023 23:29, Tristan Partin wrote:

Ha, you're right. I had this working yesterday, but convinced myself it
didn't. I had a do while loop wrapping the blocking call. Here is a v4,
which seems to pass the tests that were pointed out to be failing
earlier.

Thanks! This suffers from a classic race condition:

+			if (cancel_pressed)
+				break;
+
+			sock = PQsocket(n_conn);
+			if (sock == -1)
+				break;
+
+			rc = pqSocketPoll(sock, for_read, !for_read, (time_t)-1);
+			Assert(rc != 0); /* Timeout is impossible. */
+			if (rc == -1)
+			{
+				success = false;
+				break;
+			}

If a signal arrives between the "if (cancel_pressed)" check and
pqSocketPoll(), pgSocketPoll will "miss" the signal and block
indefinitely. There are three solutions to this:

1. Use a timeout, so that you wake up periodically to check for any
missed signals. Easy solution, the downsides are that you will not react
as quickly if the signal is missed, and you waste some cycles by waking
up unnecessarily.

2. The Self-pipe trick: https://cr.yp.to/docs/selfpipe.html. We also use
that in src/backend/storage/ipc/latch.c. It's portable, but somewhat
complicated.

3. Use pselect() or ppoll(). They were created specifically to address
this problem. Not fully portable, I think it's missing on Windows at least.

Maybe use pselect() if it's available, and select() with a timeout if
it's not.

First I have ever heard of this race condition, and now I will commit it
to durable storage :). I went ahead and did option #3 that you proposed.
On Windows, I have a 1 second timeout for the select/poll. That seemed
like a good balance of user experience and spinning the CPU. But I am
open to other values. I don't have a Windows VM, but maybe I should set
one up...

I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.

For what it's worth, I did try #2, but since psql installs its own
SIGINT handler, the code became really hairy.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v5-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v5-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+226-2
#16Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#15)
Re: psql not responding to SIGINT upon db reconnection

On Wed Nov 29, 2023 at 11:48 AM CST, Tristan Partin wrote:

I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.

Ok. Here is a patch which just uses select(2) with a timeout of 1s or
pselect(2) if it is available. I also moved the state machine processing
into its own function.

Thanks for your comments thus far.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v6-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v6-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+228-2
#17Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#16)
Re: psql not responding to SIGINT upon db reconnection

On Tue, Dec 5, 2023 at 1:35 PM Tristan Partin <tristan@neon.tech> wrote:

On Wed Nov 29, 2023 at 11:48 AM CST, Tristan Partin wrote:

I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.

Ok. Here is a patch which just uses select(2) with a timeout of 1s or
pselect(2) if it is available. I also moved the state machine processing
into its own function.

Hmm, this adds a function called pqSocketPoll to psql/command.c. But
there already is such a function in libpq/fe-misc.c. It's not quite
the same, though. Having the same function in two different modules
with subtly different definitions seems like it's probably not the
right idea.

Also, this seems awfully complicated for something that's supposed to
(checks notes) wait for a file descriptor to become ready for I/O for
up to 1 second. It's 160 lines of code in pqSocketPoll and another 50
in the caller. If this is really the simplest way to do this, we
really need to rethink our libpq APIs or, uh, something. I wonder if
we could make this simpler by, say:

- always use select
- always use a 1-second timeout
- if it returns faster because the race condition doesn't happen, cool
- if it doesn't, well then you get to wait for a second, oh well

I don't feel strongly that that's the right way to go and if Heikki or
some other committer wants to go with this more complex conditional
approach, that's fine. But to me it seems like a lot.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#17)
Re: psql not responding to SIGINT upon db reconnection

On Fri Jan 5, 2024 at 12:24 PM CST, Robert Haas wrote:

On Tue, Dec 5, 2023 at 1:35 PM Tristan Partin <tristan@neon.tech> wrote:

On Wed Nov 29, 2023 at 11:48 AM CST, Tristan Partin wrote:

I am not completely in love with the code I have written. Lots of
conditional compilation which makes it hard to read. Looking forward to
another round of review to see what y'all think.

Ok. Here is a patch which just uses select(2) with a timeout of 1s or
pselect(2) if it is available. I also moved the state machine processing
into its own function.

Hmm, this adds a function called pqSocketPoll to psql/command.c. But
there already is such a function in libpq/fe-misc.c. It's not quite
the same, though. Having the same function in two different modules
with subtly different definitions seems like it's probably not the
right idea.

Yep, not tied to the function name. Happy to rename as anyone suggests.

Also, this seems awfully complicated for something that's supposed to
(checks notes) wait for a file descriptor to become ready for I/O for
up to 1 second. It's 160 lines of code in pqSocketPoll and another 50
in the caller. If this is really the simplest way to do this, we
really need to rethink our libpq APIs or, uh, something. I wonder if
we could make this simpler by, say:

- always use select
- always use a 1-second timeout
- if it returns faster because the race condition doesn't happen, cool
- if it doesn't, well then you get to wait for a second, oh well

I don't feel strongly that that's the right way to go and if Heikki or
some other committer wants to go with this more complex conditional
approach, that's fine. But to me it seems like a lot.

I think the way to go is to expose some variation of libpq's
pqSocketPoll(), which I would be happy to put together a patch for.
Making frontends, psql in this case, have to reimplement the polling
logic doesn't strike me as fruitful, which is essentially what I have
done.

Thanks for your input!

But also wait a second. In my last email, I said:

Ok. Here is a patch which just uses select(2) with a timeout of 1s or
pselect(2) if it is available. I also moved the state machine processing
into its own function.

This is definitely not the patch I meant to send. What the? Here is what
I meant to send, but I stand by my comment that we should just expose
a variation of pqSocketPoll().

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v7-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchtext/x-patch; charset=utf-8; name=v7-0001-Allow-SIGINT-to-cancel-psql-database-reconnection.patchDownload+160-2
#19Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#18)
Re: psql not responding to SIGINT upon db reconnection

On Mon, Jan 8, 2024 at 1:03 AM Tristan Partin <tristan@neon.tech> wrote:

I think the way to go is to expose some variation of libpq's
pqSocketPoll(), which I would be happy to put together a patch for.
Making frontends, psql in this case, have to reimplement the polling
logic doesn't strike me as fruitful, which is essentially what I have
done.

I encourage further exploration of this line of attack. I fear that if
I were to commit something like what you've posted up until now,
people would complain that that code was too ugly to live, and I'd
have a hard time telling them that they're wrong.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#19)
Re: psql not responding to SIGINT upon db reconnection

On Fri Jan 12, 2024 at 10:45 AM CST, Robert Haas wrote:

On Mon, Jan 8, 2024 at 1:03 AM Tristan Partin <tristan@neon.tech> wrote:

I think the way to go is to expose some variation of libpq's
pqSocketPoll(), which I would be happy to put together a patch for.
Making frontends, psql in this case, have to reimplement the polling
logic doesn't strike me as fruitful, which is essentially what I have
done.

I encourage further exploration of this line of attack. I fear that if
I were to commit something like what you've posted up until now,
people would complain that that code was too ugly to live, and I'd
have a hard time telling them that they're wrong.

Completely agree. Let me look into this. Perhaps I can get something up
next week or the week after.

--
Tristan Partin
Neon (https://neon.tech)

#21Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#20)
#22Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Tristan Partin (#21)
#23Tristan Partin
tristan@neon.tech
In reply to: Jelte Fennema-Nio (#22)
#24Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Tristan Partin (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#23)
#26Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#26)
#28Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#28)
#30Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#30)
#32Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#31)
#33Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Robert Haas (#31)
#34Tristan Partin
tristan@neon.tech
In reply to: Jelte Fennema-Nio (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jelte Fennema-Nio (#33)
#36Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Tom Lane (#35)
#37Tristan Partin
tristan@neon.tech
In reply to: Jelte Fennema-Nio (#36)
#38Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Tristan Partin (#37)
#39Tristan Partin
tristan@neon.tech
In reply to: Jelte Fennema-Nio (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Tristan Partin (#39)
#41Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Robert Haas (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Jelte Fennema-Nio (#41)
#43Tristan Partin
tristan@neon.tech
In reply to: Robert Haas (#42)