Slow network retrieves
I'm seeing what seems like slow retrieval times over the network. I am
retrieving a single field of about 100-120 characters per record. I am
getting about 3 seconds per 1000 records - it takes 30 seconds to retrieve
10,000 records. That's only about 36 KBytes/sec.
This is a 100BT switched network (not sure if it is vlan'd or through a
router). Echo time averages 3ms.
The back end is pretty much idle. It shows 'idle in transaction'.
05-08-2004.23:54:43 Records read: 10000
05-08-2004.23:55:17 Records read: 20000
05-08-2004.23:55:50 Records read: 30000
05-08-2004.23:56:22 Records read: 40000
05-08-2004.23:56:55 Records read: 50000
05-08-2004.23:57:32 Records read: 60000
05-08-2004.23:58:07 Records read: 70000
...
The code is an ecpg program like:
EXEC SQL WHENEVER SQLERROR GOTO sql_error;
EXEC SQL WHENEVER NOT FOUND DO break;
EXEC SQL DECLARE message_cursor CURSOR FOR
SELECT
file_name
FROM
messages
WHERE
system_key=(select system_key from systems where
system_name=:systemName);
EXEC SQL OPEN message_cursor;
count = 0;
while (1) {
EXEC SQL FETCH message_cursor INTO
:fileNameDB;
memcpy (tempstr, fileNameDB.arr, fileNameDB.len);
tempstr[fileNameDB.len] = '\0';
[Action with tempstr removed for testing]
count++;
if ( (count % 10000) == 0) logmsg ("Records read: %d", count);
}
How can I speed this thing up?
Wes
<wespvp@syntegra.com> writes:
I'm seeing what seems like slow retrieval times over the network.
Are you sure it is a network problem? What performance do you get
if you run the same test program locally on the database machine?
How about issuing the same sort of FETCH commands via a psql script?
regards, tom lane
On 5/9/04 9:32 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Are you sure it is a network problem? What performance do you get
if you run the same test program locally on the database machine?
How about issuing the same sort of FETCH commands via a psql script?
Yes, it is definitely due to the network latency even though that latency is
very small. Here it is running locally:
05-09-2004.17:49:41 Records read: 10000
05-09-2004.17:49:41 Records read: 20000
05-09-2004.17:49:42 Records read: 30000
05-09-2004.17:49:42 Records read: 40000
05-09-2004.17:49:43 Records read: 50000
05-09-2004.17:49:43 Records read: 60000
05-09-2004.17:49:44 Records read: 70000
05-09-2004.17:49:45 Records read: 80000
05-09-2004.17:49:45 Records read: 90000
05-09-2004.17:49:46 Records read: 100000
05-09-2004.17:49:46 Records read: 110000
05-09-2004.17:49:47 Records read: 120000
05-09-2004.17:49:47 Records read: 130000
05-09-2004.17:49:48 Records read: 140000
My "outside looking in" observations seem to point to the fact that every
row has to be retrieved (or stored) with a separate request. Network
latency, however small, becomes an issue when the volume is very high.
A Pro*C program I recently ported from Oracle to PostgreSQL showed this
difference. In Pro*C you can load an array with rows to insert, then issue
a single INSERT request passing it the array. As far as I can tell, in
PostgreSQL ecpg (or other) you have to execute one request per record.
Is there some way to batch insert/fetch requests? How else can I improve
upon the performance? It appears that COPY works like this, but you can't
control what is returned and you have to know the column order.
Wes
<wespvp@syntegra.com> writes:
On 5/9/04 9:32 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Are you sure it is a network problem?
Yes, it is definitely due to the network latency even though that latency is
very small. Here it is running locally:
[ about 20000 records/sec ]
Okay, I just wanted to verify that we weren't overlooking any other
sorts of bottleneck. But the numbers you quote make sense as a network
issue: 33 seconds for 10000 records is 3.03 msec per record, and since
you say the measured ping time is 3 msec, it appears that FETCH has
just about the same response time as a ping ;-). So you can't really
complain about it. The only way to do better will be to batch multiple
fetches into one network round trip.
A Pro*C program I recently ported from Oracle to PostgreSQL showed this
difference. In Pro*C you can load an array with rows to insert, then issue
a single INSERT request passing it the array. As far as I can tell, in
PostgreSQL ecpg (or other) you have to execute one request per record.
The usual way to batch multiple insertions is with COPY IN. The usual
way to batch a fetch is just to SELECT the whole thing; or if that is
too much data to snarf at once, use a cursor with "FETCH n" requests.
I am not sure how either of these techniques map into ecpg though.
If you want to use ecpg then I'd suggest bringing up the question on
pgsql-interfaces --- the ecpg gurus are more likely to be paying
attention over there.
... It appears that COPY works like this, but you can't
control what is returned and you have to know the column order.
True, COPY OUT is only designed to return all the rows of a table.
However, in recent versions you can specify what columns you want
in a COPY. It's still no substitute for SELECT...
regards, tom lane
wespvp@syntegra.com wrote:
The back end is pretty much idle. It shows 'idle in transaction'.
Well, is not soo much idle, it's holding a transaction id!
That "idle in transaction" is not your problem but however I suggest you
take a look at why you have idle in transaction backend; do you
have back end sitting there days and days in that state ?
Regards
Gaetano Mendola
On 5/9/04 1:58 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
I am not sure how either of these techniques map into ecpg though.
If you want to use ecpg then I'd suggest bringing up the question on
pgsql-interfaces --- the ecpg gurus are more likely to be paying
attention over there.
I got some sample code from someone on the pgsql-interfaces list on how to
do bulk FETCH's. It is pretty much the same as with Pro*C. You just can't
use that for INSERT/UPDATE (hint, hint...) I was able to improve the
network retrieval rate from 1 million records per hour to 1 million records
per minute.
Wes