problem with PQsendQuery/PQgetResult and COPY FROM statement

Started by Nonameover 19 years ago8 messages

max.poletto@gmail.com

over 19 years ago

I attempt to execute a "COPY test FROM STDIN" using PQsendQuery.
Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy
calls, PQgetResult returns tens of millions of (invalid?) non-null
PGresults. This behavior seems incorrect, and sometimes causes my
application to exhaust memory and crash.

The postgres version is 8.1.3.

I can reproduce the problem in about 50 lines of C. I include below
(1) the code, (2) a psql dump of the table in question, (3) the code's
output.

I'd appreciate any insight or suggestions you may have.

Thanks,
max poletto

======================================================================
(1) code

// compile with: g++ -Wall -O -o pgtest pgtest.cc -lpq

#include <cerrno>
#include <cstdio>
#include <ctime>
#include <vector>
using namespace std;
#include <libpq-fe.h>
#include <sys/poll.h>

void retrieve(PGconn *conn, time_t timeout)
{
vector<PGresult *> res;
while (1) {
int r;
do {
struct pollfd pfds[1];
pfds[0].fd = PQsocket(conn);
pfds[0].events = POLLIN;
pfds[0].revents = 0;
r = poll(pfds, sizeof(pfds) / sizeof(struct pollfd), 1000 *
timeout);
} while (r < 0 && errno == EINTR);
if (r <= 0 || !PQconsumeInput(conn))
return;
int i = 0;
PGresult *oldr = 0;
while (!PQisBusy(conn)) {
PGresult *r = PQgetResult(conn);
if (r) {
res.push_back(r);
if (++i % 5000000 == 0) { printf("%d results\n", i); }
if (r == oldr) { printf("r==oldr (%p)\n", r); }
oldr = r;
} else {
printf("PQgetResult return 0 after %d results\n", i);
return;
}
}
}
}

int main()
{
PGconn *conn = PQconnectdb("dbname=testdb user=postgres");
if (!conn)
return -1;
if (PQsendQuery(conn, "COPY test FROM STDIN") > 0) {
retrieve(conn, 20);
if (PQputCopyEnd(conn, 0) < 0)
return -1;
}
PQfinish(conn);
return 0;
}

======================================================================
(2) psql session

root@tm01-5% psql testdb postgres
Welcome to psql 8.1.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

testdb=# select * from test;
a | b | c
---+---+---
1 | 2 | 3
(1 row)

testdb=#

======================================================================
(3) output

root@tm01-5% ./pgtest
5000000 results
10000000 results
15000000 results
20000000 results
25000000 results
PQgetResult return 0 after 25649299 results
4.640u 4.696s 0:09.34 99.8% 0+0k 0+0io 0pf+0w

In this toy example, the process VM size exceeds 2GB before PQgetResult
finally returns 0. The real application, which has already allocated
~1GB prior to the query, runs out of memory (3GB limit on 32-bit linux)
before PQgetResult ever returns 0.

Volkan YAZICI

yazicivo@ttnet.net.tr

over 19 years ago

In reply to: Noname (#1)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

On May 19 11:51, max.poletto@gmail.com wrote:

if (PQsendQuery(conn, "COPY test FROM STDIN") > 0) {
retrieve(conn, 20);

Shouldn't you be send()'ing instead of retrieve()'ing? COPY tbl FROM
stdin, requests data from client to COPY FROM stdin TO tbl.

Regards.

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Noname (#1)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

max.poletto@gmail.com writes:

I attempt to execute a "COPY test FROM STDIN" using PQsendQuery.
Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy
calls, PQgetResult returns tens of millions of (invalid?) non-null
PGresults.

You should fix your code to pay some attention to what those results
are. I'm betting they are error results.

This behavior seems incorrect, and sometimes causes my
application to exhaust memory and crash.

Well, that's because you're not PQclear'ing a result when done with it.

regards, tom lane

Martijn van Oosterhout

kleptog@svana.org

over 19 years ago

In reply to: Noname (#1)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

On Fri, May 19, 2006 at 11:51:00AM -0700, max.poletto@gmail.com wrote:

I attempt to execute a "COPY test FROM STDIN" using PQsendQuery.
Subsequently, after the appropriate poll()/PQconsumeInput/PQisBusy
calls, PQgetResult returns tens of millions of (invalid?) non-null
PGresults. This behavior seems incorrect, and sometimes causes my
application to exhaust memory and crash.

In addition to the issues other people have pointed out, when using
COPY you transfer the data with PQputCopyData()/PQgetCopyData().

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Noname

max.poletto@gmail.com

over 19 years ago

In reply to: Noname (#1)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

Thanks for all your replies, but I must clarify some things.

First, note that what I posted is just a small example that reproduces
behavior that appears incorrect. The real code is a C++ wrapper
around libpq that supports non-blocking queries and reuses open
connections.

Volkan and Martijn: I know about PQ{put,get}CopyData, but my example
never gets to that point. I must first determine whether the
(asynchronous) PQsendQuery() of "COPY test FROM STDIN" succeeded.
That's all that retrieve() tries to do in my example.

Tom: of course I should (and eventually do) use PQclear(), but I may
not want to right away, because I must return to the user a vector of
result objects (for example, all the result rows from a query).

I do not expect PQgetResult to return millions of non-null PGresult
objects after a PQsendQuery("COPY test FROM STDIN"). I expect exactly
one non-null result, with a result status of PGRES_COPY_IN. Moreover,
the manual says:

If a COPY command is issued via PQexec in a string that could
contain
additional commands, the application must continue fetching results
via PQgetResult after completing the COPY sequence. Only when
PQgetResult returns NULL is it certain that the PQexec command
string
is done and it is safe to issue more commands.

I assumed this to be true for PQexec or "one of the equivalent
functions" mentioned in the manual, such as PQsendQuery. However, if
I add the following switch statement to my example:

while (!PQisBusy(conn)) {
PGresult *r = PQgetResult(conn);
if (r) {
switch (PQresultStatus(r)) {
case PGRES_COPY_IN:
break;
case PGRES_EMPTY_QUERY:
printf("PGRES_EMPTY_QUERY\n");
break;
case PGRES_COMMAND_OK:
printf("PGRES_COMMAND_OK\n");
break;
case PGRES_TUPLES_OK:
printf("PGRES_TUPLES_OK\n");
break;
case PGRES_COPY_OUT:
printf("PGRES_COPY_OUT\n");
break;
case PGRES_BAD_RESPONSE:
printf("PGRES_BAD_RESPONSE\n");
break;
case PGRES_NONFATAL_ERROR:
printf("PGRES_NONFATAL_ERROR\n");
break;
case PGRES_FATAL_ERROR:
printf("PGRES_FATAL_ERROR\n");
break;
}
res.push_back(r);
if (++i % 5000000 == 0) { printf("%d results\n", i); }
if (r == oldr) { printf("r==oldr (%p)\n", r); }
oldr = r;
} else {
printf("PQgetResult return 0 after %d results\n", i);
return;
}
}

the code still prints only:

5000000 results
10000000 results
15000000 results
20000000 results
25000000 results
PQgetResult return 0 after 25649299 results

In other words, there are >25M distinct non-null results, and all of
them have status code PGRES_COPY_IN, and none of them have errors.

So it appears that I should check whether the first PGresult object
has a status code of PGRES_COPY_IN, and ignore subsequent PGresults
even if they are not NULL. I don't object to this interface, but it
is not what I would conclude after RTFM.

max

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Noname (#5)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

max.poletto@gmail.com writes:

I do not expect PQgetResult to return millions of non-null PGresult
objects after a PQsendQuery("COPY test FROM STDIN"). I expect exactly
one non-null result, with a result status of PGRES_COPY_IN.

If you call it exactly once, it'll say that exactly once. If you keep
calling it "millions of times", it'll keep saying that.

Moreover, the manual says:

If a COPY command is issued via PQexec in a string that could
contain
additional commands, the application must continue fetching results
via PQgetResult after completing the COPY sequence. Only when
PQgetResult returns NULL is it certain that the PQexec command
string
is done and it is safe to issue more commands.

Indeed. You forgot to "complete the COPY sequence" before returning
to the PQgetResult loop. As long as the thing is in COPY mode,
PQgetResult will return a result saying PGRES_COPY_IN. The point
of this paragraph is that you might want to consider doing more
PQgetResults *after* you've ended COPY mode.

regards, tom lane

Massimiliano Poletto

max.poletto@gmail.com

over 19 years ago

In reply to: Tom Lane (#6)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

If you call it exactly once, it'll say that exactly once. If you keep
calling it "millions of times", it'll keep saying that.

OK. I wonder, though, why at some point it does in fact return 0.

Indeed. You forgot to "complete the COPY sequence" before returning
to the PQgetResult loop. As long as the thing is in COPY mode,
PQgetResult will return a result saying PGRES_COPY_IN. The point

Got it. Thanks.

max

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Massimiliano Poletto (#7)

Re: problem with PQsendQuery/PQgetResult and COPY FROM statement

"Massimiliano Poletto" <max.poletto@gmail.com> writes:

If you call it exactly once, it'll say that exactly once. If you keep
calling it "millions of times", it'll keep saying that.

OK. I wonder, though, why at some point it does in fact return 0.

Probably because you ran out of memory to create new PGresult structs.

regards, tom lane