BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

Started by Andrey Lizenkoalmost 9 years ago7 messagesbugs
Jump to latest
#1Andrey Lizenko
lizenko79@gmail.com

The following bug has been logged on the website:

Bug reference: 14720
Logged by: Andrey Lizenko
Email address: lizenko79@gmail.com
PostgreSQL version: 9.6.3
Operating system: Solaris 11.3
Description:

I've got the following message running PostgreSQL 9.6.3 on Solaris 11.3
(both latest stable).

getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

Unfortunately, I can not reproduce it with libpq c code examples, but at
least I can see it while using pgAdmin 3 , pgAdmin 4 and zabbix monitoring
extension libzbxpgsql.

In getsockopt manual only SO_KEEPALIVE mentioned.

Regards,
Andrey Lizenko

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrey Lizenko (#1)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

lizenko79@gmail.com wrote:

The following bug has been logged on the website:

Bug reference: 14720
Logged by: Andrey Lizenko
Email address: lizenko79@gmail.com
PostgreSQL version: 9.6.3
Operating system: Solaris 11.3
Description:

I've got the following message running PostgreSQL 9.6.3 on Solaris 11.3
(both latest stable).

getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

Unfortunately, I can not reproduce it with libpq c code examples, but at
least I can see it while using pgAdmin 3 , pgAdmin 4 and zabbix monitoring
extension libzbxpgsql.

In getsockopt manual only SO_KEEPALIVE mentioned.

It sounds like your system defines the TCP_KEEPALIVE symbol at compile
time but the kernel doesn't know it; maybe the package was compiled in a
system where the kernel does support that option, and you're running it
in one that doesn't?

Are you getting the message in the client side or server side? If the
latter, you should just set tcp_keepalives_idle to 0 in postgresql.conf.
If the former, I think the only option is to fix the libpq compile.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

lizenko79@gmail.com wrote:

I've got the following message running PostgreSQL 9.6.3 on Solaris 11.3
(both latest stable).
getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

It sounds like your system defines the TCP_KEEPALIVE symbol at compile
time but the kernel doesn't know it; maybe the package was compiled in a
system where the kernel does support that option, and you're running it
in one that doesn't?

Actually, I find the same error in the logs for our Solaris buildfarm
members. So apparently that's been going on since day one, and we
hadn't noticed it, though I now find that it's been reported before:
/messages/by-id/CAJgtxT6QL0_Gt+TkSDw=q1=YVJkT73FoSrtStcu5Hy+-SXn8rw@mail.gmail.com

Some googling turned up the tcp(7P) man page for Solaris 11:
https://docs.oracle.com/cd/E36784_01/html/E36884/tcp-7p.html#REFMAN7tcp-7p

and it says this:

SunOS supports the keep-alive mechanism described in RFC 1122. It is
enabled using the socket option SO_KEEPALIVE. When enabled, the first
keep-alive probe is sent out after a TCP is idle for two hours. If the
peer does not respond to the probe within eight minutes, the TCP
connection is aborted. You can alter the interval for sending out the
first probe using the socket option TCP_KEEPALIVE_THRESHOLD. The option
value is an unsigned integer in milliseconds. The system default is
controlled by the TCP ndd parameter tcp_keepalive_interval. The minimum
value is ten seconds. The maximum is ten days, while the default is two
hours. If you receive no response to the probe, you can use the
TCP_KEEPALIVE_ABORT_THRESHOLD socket option to change the time threshold
for aborting a TCP connection. The option value is an unsigned integer
in milliseconds. The value zero indicates that TCP should never time out
and abort the connection when probing. The system default is controlled
by the TCP ndd parameter tcp_keepalive_abort_interval. The default is
eight minutes.

So apparently, Linux's TCP_KEEPIDLE corresponds to Solaris'
TCP_KEEPALIVE_THRESHOLD. TCP_KEEPINTVL and TCP_KEEPCNT seem to have no
direct equivalent, although TCP_KEEPALIVE_ABORT_THRESHOLD would correspond
to their product.

I suggest that we ought to expand the keepalive code to know about this
synonym.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

I wrote:

So apparently, Linux's TCP_KEEPIDLE corresponds to Solaris'
TCP_KEEPALIVE_THRESHOLD. TCP_KEEPINTVL and TCP_KEEPCNT seem to have no
direct equivalent, although TCP_KEEPALIVE_ABORT_THRESHOLD would correspond
to their product.

I suggest that we ought to expand the keepalive code to know about this
synonym.

Concretely, something like the attached. I have no way to test this
locally, so I'm thinking of just pushing it and seeing what the buildfarm
says.

regards, tom lane

Attachments:

add-Solaris-keepalive-option.patchtext/x-diff; charset=us-ascii; name=add-Solaris-keepalive-option.patchDownload+62-63
#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#4)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

On Wed, Jun 28, 2017 at 7:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Concretely, something like the attached. I have no way to test this
locally, so I'm thinking of just pushing it and seeing what the buildfarm
says.

! #if defined(TCP_KEEPIDLE)
! /* TCP_KEEPIDLE is the name of this option on Linux and *BSD */
if (setsockopt(port->sock, IPPROTO_TCP, TCP_KEEPIDLE,
(char *) &idle, sizeof(idle)) < 0)
{
elog(LOG, "setsockopt(TCP_KEEPIDLE) failed: %m");
return STATUS_ERROR;
}
! #elif defined(TCP_KEEPALIVE_THRESHOLD)
What about defining a PG_TCP_KEEPALIVE instead?

Side note: Windows has something with a different set of options:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#5)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

Michael Paquier <michael.paquier@gmail.com> writes:

What about defining a PG_TCP_KEEPALIVE instead?

I thought about that, but it would complicate constructing the elog
messages, so I didn't bother. It might be worth working harder if
we ever grow any more alternatives.

Side note: Windows has something with a different set of options:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx

Yeah, the Windows part of that code is a real mess. But it works
as far as I've heard.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: BUG #14720: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol

I wrote:

Concretely, something like the attached. I have no way to test this
locally, so I'm thinking of just pushing it and seeing what the buildfarm
says.

So that didn't work: castoroides is still showing

[5953a7e1.1fff:13] LOG: getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol
[5953a7e1.1fff:14] STATEMENT: select name, setting from pg_settings where name like 'enable%';

which implies that TCP_KEEPALIVE_THRESHOLD doesn't exist on Solaris 10.
Evidently, the logic here needs to be along the lines of

#if defined(TCP_KEEPIDLE)
...
#elif defined(TCP_KEEPALIVE_THRESHOLD)
...
#elif defined(TCP_KEEPALIVE) && defined(__darwin__)
...

Or we could make the last test be !defined(__solaris__), but I'm not
sure that's better. Anybody have an opinion?

As long as I have to touch this code again anyway, I'm also going to
look into Michael's thought of trying to reduce code duplication.
I was unhappy yesterday about how to handle the error messages,
but we could do it like this:

#if defined(TCP_KEEPIDLE)
#define PG_TCP_KEEPALIVE TCP_KEEPIDLE
#define PG_TCP_KEEPALIVE_STR "TCP_KEEPIDLE"
#elif ...

#ifdef PG_TCP_KEEPALIVE
if (setsockopt(port->sock, IPPROTO_TCP, PG_TCP_KEEPALIVE,
(char *) &idle, sizeof(idle)) < 0)
{
elog(LOG, "setsockopt(%s) failed: %m", PG_TCP_KEEPALIVE_STR);

which doesn't seem too painful.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs