Hang issue when COPY to/from an unopened FIFO

Started by Kenan Yaoover 9 years ago4 messages
#1Kenan Yao
kyao@pivotal.io

Hi devs,

I came across a hang issue when COPY to a FIFO file, because the FIFO is
not opened for read on the other end. The backtrace from master branch is
like:

#0 0x000000332ccc6c30 in __open_nocancel () from /lib64/libc.so.6
#1 0x000000332cc6b693 in __GI__IO_file_open () from /lib64/libc.so.6
#2 0x000000332cc6b7dc in _IO_new_file_fopen () from /lib64/libc.so.6
#3 0x000000332cc60a04 in __fopen_internal () from /lib64/libc.so.6
#4 0x00000000007d0536 in AllocateFile ()
#5 0x00000000005f3d8e in BeginCopyFrom ()
#6 0x00000000005ef192 in DoCopy ()
#7 0x000000000080bb22 in standard_ProcessUtility ()
#8 0x000000000080b4d6 in ProcessUtility ()
#9 0x000000000080a5cf in PortalRunUtility ()
#10 0x000000000080a78c in PortalRunMulti ()
#11 0x0000000000809dff in PortalRun ()
#12 0x0000000000803f85 in exec_simple_query ()
#13 0x000000000080807f in PostgresMain ()
#14 0x000000000077f639 in BackendRun ()
#15 0x000000000077eceb in BackendStartup ()
#16 0x000000000077b185 in ServerLoop ()
#17 0x000000000077a7ee in PostmasterMain ()
#18 0x00000000006c93de in main ()


Reproduction is simple:

-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions


The problem is that, if we are trapped here, we cannot cancel or terminate
this backend process unless we open the FIFO for read.

I am not sure whether this should be categorized as a bug, since it is
caused by wrong usage of FIFO indeed, but the backend cannot be terminated
anyhow.

I see recv and send call in secure_read/secure_write are implemented as
non-blocking style to make them interruptible, is it worthy to turn fopen
into non-blocking style as well?

Same thing would happen for file_fdw on an unopened FIFO.

Cheers,
Kenan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kenan Yao (#1)
Re: Hang issue when COPY to/from an unopened FIFO

Kenan Yao <kyao@pivotal.io> writes:

-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions

This does not seem like a supported case to me. I see few if any reasons
to want to do that rather than doing copy-to-program or copy-to-client.
We're certainly not going to want to add any overhead to the COPY code
paths in order to allow it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#2)
Re: Hang issue when COPY to/from an unopened FIFO

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Kenan Yao <kyao@pivotal.io> writes:

-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions

This does not seem like a supported case to me. I see few if any reasons
to want to do that rather than doing copy-to-program or copy-to-client.
We're certainly not going to want to add any overhead to the COPY code
paths in order to allow it.

The complaint is that there's no way to safely kill a process which has
gotten stuck in a fopen() call. I sympathize with that point of view as
there are many ways in which a process could get stuck in a fopen() or
similar call and it would be nice to have a way to kill such processes
without bouncing the entire server (though I wonder if this is a way to
end up with a dead backend that sticks around after the postmaster has
quit too, which is also quite bad...).

Thanks!

Stephen

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Hang issue when COPY to/from an unopened FIFO

On 2016-07-14 10:20:42 -0400, Tom Lane wrote:

Kenan Yao <kyao@pivotal.io> writes:

-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions

This does not seem like a supported case to me. I see few if any reasons
to want to do that rather than doing copy-to-program or copy-to-client.
We're certainly not going to want to add any overhead to the COPY code
paths in order to allow it.

Agreed on that.

Said overhead would be a good reason to get rid of using buffered IO at
some point though - we're doing our own buffering anyway, and the stream
code adds noticeable overhead (doubles the cache footprint
basically). Even worse, it uses locking internally on many
platforms... In that case adding sane EINTR handling seems trivial.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers