C++, Postgres , libpqxx huge query

Started by alexandros_ealmost 12 years ago7 messagesgeneral
Jump to latest
#1alexandros_e
alexandros.ef@gmail.com

Hello experts. I have posted this question on stack overflow, but I did not
get any detailed answer, so I thought I should cross post here. My
apologies.

I have to execute an SQL query to Postgres by the following code. The query
returns a huge number of rows (40M or more) and has 4 integer fields: When I
use a workstation with 32Gb everything works but on a 16Gb workstation the
query is very slow (due to swapping I guess). Is there any way to tell the
C++ to load rows at batches, without waiting the entire dataset? With Java I
never had these issues before, due to the probably better JDBC driver.

try {
work W(*Conn);
result r = W.exec(sql[sqlLoad]);
W.commit();

for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.....

} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}

I am using PostgreSQL 9.3 and there I see this
http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html, but I
do not how to use it on my C++ code. Your help will be appreciated.

EDIT: This query runs only once, for creating the necessary main memory data
structures. As such, tt cannot be optimized. Also, pgAdminIII could easily
fetch those rows, in under one minute on the same (or with smaller RAM) PCs.
Also, Java could easily handle twice the number of rows (with
Statent.setFetchSize()
http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize%28int%29)
So, it is really an issue for the libpqxx library and not an application
issue. Is there a way to enforce this functionality in C++, without
explicitly setting limits / offsets manually? Also, is there another driver
I should use with C++ that offers this kind of functionality?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/C-Postgres-libpqxx-huge-query-tp5802330.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Alexandros Efentakis
efentakis@imis.athena-innovation.gr
In reply to: alexandros_e (#1)
Re: C++, Postgres , libpqxx huge query

Thanks John for your answer. The problem is how do I declare the cursor and
fetch through C++ and libpq++? And if I fetch 1000 rows at a time, then how
do I handle that when I reach the end and there are less than 1000 rows
available? Or I need to fetch one row at a time like this
http://stackoverflow.com/questions/16128142/how-to-use-pqxxstateless-cursor-class-from-libpqxx
?

A.

2014-05-04 12:11 GMT+03:00 John R Pierce <pierce@hogranch.com>:

Show quoted text

On 5/4/2014 1:57 AM, alexandros_e wrote:

I have to execute an SQL query to Postgres by the following code. The
query
returns a huge number of rows (40M or more) and has 4 integer fields:
When I
use a workstation with 32Gb everything works but on a 16Gb workstation the
query is very slow (due to swapping I guess). Is there any way to tell the
C++ to load rows at batches, without waiting the entire dataset? With
Java I
never had these issues before, due to the probably better JDBC driver.

see DECLARE and FETCH.

http://www.postgresql.org/docs/current/static/sql-declare.html
http://www.postgresql.org/docs/current/static/sql-fetch.html

--
john r pierce 37N 122W
somewhere on the middle of the left coast

#3Alban Hertroys
haramrae@gmail.com
In reply to: alexandros_e (#1)
Re: C++, Postgres , libpqxx huge query

On 04 May 2014, at 10:57, alexandros_e <alexandros.ef@gmail.com> wrote:

Hello experts. I have posted this question on stack overflow, but I did not
get any detailed answer, so I thought I should cross post here. My
apologies.

I have to execute an SQL query to Postgres by the following code. The query
returns a huge number of rows (40M or more) and has 4 integer fields: When I
use a workstation with 32Gb everything works but on a 16Gb workstation the
query is very slow (due to swapping I guess). Is there any way to tell the
C++ to load rows at batches, without waiting the entire dataset? With Java I
never had these issues before, due to the probably better JDBC driver.

try {
work W(*Conn);
result r = W.exec(sql[sqlLoad]);
W.commit();

for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.....

} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}

I am using PostgreSQL 9.3 and there I see this
http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html, but I
do not how to use it on my C++ code. Your help will be appreciated.

I think the section of relevance is: http://www.postgresql.org/docs/9.3/static/libpq-async.html

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Andres Freund
andres@anarazel.de
In reply to: alexandros_e (#1)
Re: C++, Postgres , libpqxx huge query

Hi,

On 2014-05-04 01:57:43 -0700, alexandros_e wrote:

I have to execute an SQL query to Postgres by the following code. The query
returns a huge number of rows (40M or more) and has 4 integer fields: When I
use a workstation with 32Gb everything works but on a 16Gb workstation the
query is very slow (due to swapping I guess). Is there any way to tell the
C++ to load rows at batches, without waiting the entire dataset? With Java I
never had these issues before, due to the probably better JDBC driver.

Try looking into either using a serverside cursor or COPY.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: C++, Postgres , libpqxx huge query

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-05-04 01:57:43 -0700, alexandros_e wrote:

I have to execute an SQL query to Postgres by the following code. The query
returns a huge number of rows (40M or more) and has 4 integer fields: When I
use a workstation with 32Gb everything works but on a 16Gb workstation the
query is very slow (due to swapping I guess). Is there any way to tell the
C++ to load rows at batches, without waiting the entire dataset? With Java I
never had these issues before, due to the probably better JDBC driver.

Try looking into either using a serverside cursor or COPY.

Another possibility is libpq's recently-introduced row-at-a-time mode:

http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html

though I'm not sure how effectively that's supported by libpqxx.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Alexandros Efentakis
efentakis@imis.athena-innovation.gr
In reply to: Tom Lane (#5)
Re: C++, Postgres , libpqxx huge query

To answer my own question, I adapted
http://stackoverflow.com/questions/16128142/how-to-use-pqxxstateless-cursor-class-from-libpqxx

try {
work W(*Conn);
pqxx::stateless_cursor<pqxx::cursor_base::read_only,
pqxx::cursor_base::owned>
cursor(W, sql[sqlLoad], "mycursor", false);
/* Assume you know total number of records returned */
for (size_t idx = 0; idx < countRecords; idx += 100000) {
/* Fetch 100,000 records at a time */
result r = cursor.retrieve(idx, idx + 100000);
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.............
}
}
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}

2014-05-04 17:34 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-05-04 01:57:43 -0700, alexandros_e wrote:

I have to execute an SQL query to Postgres by the following code. The

query

returns a huge number of rows (40M or more) and has 4 integer fields:

When I

use a workstation with 32Gb everything works but on a 16Gb workstation

the

query is very slow (due to swapping I guess). Is there any way to tell

the

C++ to load rows at batches, without waiting the entire dataset? With

Java I

never had these issues before, due to the probably better JDBC driver.

Try looking into either using a serverside cursor or COPY.

Another possibility is libpq's recently-introduced row-at-a-time mode:

http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html

though I'm not sure how effectively that's supported by libpqxx.

regards, tom lane

#7alexandros_e
alexandros.ef@gmail.com
In reply to: Tom Lane (#5)
Re: C++, Postgres , libpqxx huge query

To answer my own question, I adapted How to use pqxx::stateless_cursor class
from libpqxx?

try {
work W(*Conn);
pqxx::stateless_cursor<pqxx::cursor_base::read_only,
pqxx::cursor_base::owned>
cursor(W, sql[sqlLoad], "mycursor", false);
/* Assume you know total number of records returned */
for (size_t idx = 0; idx < countRecords; idx += 100000) {
/* Fetch 100,000 records at a time */
result r = cursor.retrieve(idx, idx + 100000);
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.............
}
}
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}

--
View this message in context: http://postgresql.1045698.n5.nabble.com/C-Postgres-libpqxx-huge-query-tp5802330p5802392.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general