COPY FROM STDIN behaviour on end-of-file

Started by Thomas Munroover 8 years ago12 messages
#1Thomas Munro
thomas.munro@enterprisedb.com
1 attachment(s)

Hi hackers,

I you hit ^d while COPY FROM STDIN is reading then subsequent COPY
FROM STDIN commands return immediately. That's because we never clear
the end-of-file state on the libc FILE. Shouldn't we do that, perhaps
with something like the attached?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

clear-copy-stream-eof.patchapplication/octet-stream; name=clear-copy-stream-eof.patchDownload
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 2005b9a0bfc..e97a5e6f184 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -614,6 +614,8 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
 				if (!fgresult)
 				{
 					copydone = true;
+					if (feof(copystream) && !ferror(copystream))
+						clearerr(copystream);
 					break;
 				}
 
#2Vaishnavi Prabakaran
vaishnaviprabakaran@gmail.com
In reply to: Thomas Munro (#1)
Re: COPY FROM STDIN behaviour on end-of-file

On Tue, May 16, 2017 at 8:40 AM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

I you hit ^d while COPY FROM STDIN is reading then subsequent COPY
FROM STDIN commands return immediately.

Hi, I could not reproduce this issue. Even after Ctrl+d , subsequent COPY
from commands reads the input properly. Is there any specific step you
followed or can you share the sample testcase?

Thanks & Regards,
Vaishnavi
Fujitsu Australia.

#3Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Vaishnavi Prabakaran (#2)
Re: COPY FROM STDIN behaviour on end-of-file

On Tue, May 16, 2017 at 6:29 PM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:

On Tue, May 16, 2017 at 8:40 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I you hit ^d while COPY FROM STDIN is reading then subsequent COPY
FROM STDIN commands return immediately.

Hi, I could not reproduce this issue. Even after Ctrl+d , subsequent COPY
from commands reads the input properly. Is there any specific step you
followed or can you share the sample testcase?

Hmm. Doesn't happen on GNU/Linux, does happen on macOS and FreeBSD. Example:

postgres=# create table t(a int);
CREATE TABLE
postgres=# copy t(a) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

42
[...press ^d here ...]COPY 1

postgres=# copy t(a) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

COPY 0

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#3)
Re: COPY FROM STDIN behaviour on end-of-file

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, May 16, 2017 at 6:29 PM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:

Hi, I could not reproduce this issue. Even after Ctrl+d , subsequent COPY
from commands reads the input properly. Is there any specific step you
followed or can you share the sample testcase?

Hmm. Doesn't happen on GNU/Linux, does happen on macOS and FreeBSD.

Hah, confirmed here. Thinking about it, it seems like glibc must be
exceeding its authority to make this work on Linux; I don't know of
anything in POSIX saying that the eof indicator should go away without
a clearerr() call. In fact, it seems like glibc is violating its own
documentation --- "man feof" saith

The function feof() tests the end-of-file indicator for the stream
pointed to by stream, returning non-zero if it is set. The end-of-file
indicator can only be cleared by the function clearerr().

I had been supposing that this was a feature addition and should be left
for the next commitfest. But given that it already works as-expected on
popular platform(s), the fact that it doesn't work the same on some other
platforms seems like a portability bug rather than a missing feature.
Now I'm inclined to treat it as a bug and back-patch.

Reviewing the actual patch, though ... there seem to be paths through
handleCopyIn that would not hit this, particularly the sigsetjmp path.
It's unlikely that we'd get control-C before we finish servicing
control-D, but maybe not impossible. Wouldn't it be better just to do
an unconditional clear at the end, maybe

copyin_cleanup:

+	/* In case input is from a tty, reset the EOF indicator. */
+	clearerr(copystream);
+ 
 	/*
 	 * Check command status and return to normal libpq state.
 	 *

(I'd be inclined to make the comment significantly more verbose than
that, btw.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: COPY FROM STDIN behaviour on end-of-file

I wrote:

I had been supposing that this was a feature addition and should be left
for the next commitfest. But given that it already works as-expected on
popular platform(s), the fact that it doesn't work the same on some other
platforms seems like a portability bug rather than a missing feature.
Now I'm inclined to treat it as a bug and back-patch.

BTW, the main argument for considering it a new feature is that we don't
suggest anywhere in our code or docs that this will work. If we're going
to go out of our way to make it work, should we mention it in psql-ref?
And what about changing the interactive prompt, along the lines of

End with a backslash and a period on a line by itself, or an EOF signal.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: COPY FROM STDIN behaviour on end-of-file

On Tue, May 16, 2017 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

I had been supposing that this was a feature addition and should be left
for the next commitfest. But given that it already works as-expected on
popular platform(s), the fact that it doesn't work the same on some other
platforms seems like a portability bug rather than a missing feature.
Now I'm inclined to treat it as a bug and back-patch.

BTW, the main argument for considering it a new feature is that we don't
suggest anywhere in our code or docs that this will work. If we're going
to go out of our way to make it work, should we mention it in psql-ref?
And what about changing the interactive prompt, along the lines of

End with a backslash and a period on a line by itself, or an EOF signal.

Well, the current behavior is so wonky and inconsistent that it's hard
for me to view it as anything but a bug. I mean, one can argue about
exactly what an EOF should do in any given situation, but surely it
can't be right for it to do one thing on one platform and something
else on a different platform.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: COPY FROM STDIN behaviour on end-of-file

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 16, 2017 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I had been supposing that this was a feature addition and should be left
for the next commitfest. But given that it already works as-expected on
popular platform(s), the fact that it doesn't work the same on some other
platforms seems like a portability bug rather than a missing feature.
Now I'm inclined to treat it as a bug and back-patch.

BTW, the main argument for considering it a new feature is that we don't
suggest anywhere in our code or docs that this will work. If we're going
to go out of our way to make it work, should we mention it in psql-ref?
And what about changing the interactive prompt, along the lines of
End with a backslash and a period on a line by itself, or an EOF signal.

Well, the current behavior is so wonky and inconsistent that it's hard
for me to view it as anything but a bug. I mean, one can argue about
exactly what an EOF should do in any given situation, but surely it
can't be right for it to do one thing on one platform and something
else on a different platform.

Oh, I still believe it's a bug. I'm just saying that if we're going
to fix it, we should do more than just make a minimal code change.

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: COPY FROM STDIN behaviour on end-of-file

Tom Lane wrote:

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

I last used a Windows command line almost two decades ago now, but
Ctrl-Z used to do it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Vaishnavi Prabakaran
vaishnaviprabakaran@gmail.com
In reply to: Alvaro Herrera (#8)
Re: COPY FROM STDIN behaviour on end-of-file

On Wed, May 17, 2017 at 3:52 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Tom Lane wrote:

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

I last used a Windows command line almost two decades ago now, but
Ctrl-Z used to do it.

Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
it is not reproducible in windows.

Thanks & Regards,
Vaishnavi
Fujitsu Australia.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vaishnavi Prabakaran (#9)
Re: COPY FROM STDIN behaviour on end-of-file

Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> writes:

Tom Lane wrote:

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
it is not reproducible in windows.

Thanks for checking. So that's two major platforms where it works "as
expected" already.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#10)
1 attachment(s)
Re: COPY FROM STDIN behaviour on end-of-file

On Wed, May 17, 2017 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> writes:

Tom Lane wrote:

BTW, it would be a good idea for somebody to check this out on Windows,
assuming there's a way to generate a keyboard EOF signal there.

Ctrl-Z + Enter in windows generates EOF signal. I verified this issue and
it is not reproducible in windows.

Thanks for checking. So that's two major platforms where it works "as
expected" already.

Ah... the reason this is happening is that BSD-derived fread()
implementations return immediately if the EOF flag is set[1]https://github.com/freebsd/freebsd/blob/afbef1895e627cd1993428a252d39b505cf6c085/lib/libc/stdio/refill.c#L79, but
others do not. At a guess, not doing that is probably more conformant
with POSIX ("... less than nitems only if a read error or end-of-file
is *encountered*", which seems to refer to the underlying condition
and not the user-clearable EOF flag).

We are neither clearing nor checking the EOF flag explicitly, and that
only works out OK on fread implementation that also ignore it.

Tom Lane <tgl@sss.pgh.pa.us> wrote (further upstream):

If we're going
to go out of our way to make it work, should we mention it in psql-ref?

I went looking for the place to put that and found that it already says:

For <literal>\copy ... from stdin</>, data rows are read from the same
source that issued the command, continuing until <literal>\.</literal>
is read or the stream reaches <acronym>EOF</>.

That's probably referring to the "outer" stream, such as a file that
contains the COPY ... FROM STDIN command, but doesn't it seem like a
general enough statement to cover ^d in the interactive case too?

Here's a version incorporating your other suggestions and a comment to explain.

[1]: https://github.com/freebsd/freebsd/blob/afbef1895e627cd1993428a252d39b505cf6c085/lib/libc/stdio/refill.c#L79

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

clear-copy-stream-eof-v2.patchapplication/octet-stream; name=clear-copy-stream-eof-v2.patchDownload
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 2005b9a0bfc..d9435c297b0 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -540,7 +540,7 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
 		showprompt = true;
 		if (!pset.quiet)
 			puts(_("Enter data to be copied followed by a newline.\n"
-				   "End with a backslash and a period on a line by itself."));
+				   "End with a backslash and a period on a line by itself, or an EOF signal."));
 	}
 	else
 		showprompt = false;
@@ -674,6 +674,15 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
 copyin_cleanup:
 
 	/*
+	 * Clear the EOF flag on the stream, in case copying finished with an EOF
+	 * signal.  An interactive TTY session may need to reuse the stream for a
+	 * later COPY command.  Although we don't ever test the flag with feof(),
+	 * some fread() implementations won't wait for more data until we clear
+	 * it.
+	 */
+	clearerr(copystream);
+
+	/*
 	 * Check command status and return to normal libpq state.
 	 *
 	 * We do not want to return with the status still PGRES_COPY_IN: our
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#11)
Re: COPY FROM STDIN behaviour on end-of-file

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Wed, May 17, 2017 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thanks for checking. So that's two major platforms where it works "as
expected" already.

Ah... the reason this is happening is that BSD-derived fread()
implementations return immediately if the EOF flag is set[1], but
others do not.

Hah, good detective work. I tried this on the lone SysV-derived box
I have (ancient HPUX), and did not see the problem, so that seems to
confirm the comment you found that says this is a SysV-tradition vs
BSD-tradition thing.

If we're going
to go out of our way to make it work, should we mention it in psql-ref?

I went looking for the place to put that and found that it already says:

For <literal>\copy ... from stdin</>, data rows are read from the same
source that issued the command, continuing until <literal>\.</literal>
is read or the stream reaches <acronym>EOF</>.

Yeah, it seems like that's clear enough already; I don't feel a need to
complicate it.

Another thing that would be nice is a regression test, but I don't see
any way to do that in our standard test environments. I could imagine
building a test rig that fires up psql through a PTY, but making it
portable is a daunting prospect, and probably not worth the trouble.

Here's a version incorporating your other suggestions and a comment to explain.

I editorialized on the comment a bit and pushed it. Thanks for the
report and patch!

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers