large table problem

Started by Jason Nerothinalmost 19 years ago4 messagesgeneral

jasonnerothin@gmail.com

almost 19 years ago

I'm trying to work my way around a large query problem.

In my system, I've created a number of large materialized views that are the
output of some computationally expensive stored procedures on other large
tables in my system. They are intended to serve as staging tables for the
next phase of computation, such that I can call (the presumably efficient):

select * from <my_mv_table>;

and then have at the data on the other side of a Java application server.

These tables range anywhere from 400,000 to >32,000,000 rows, though
individual records are only 5 integers wide.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries
which are "unnamed native queries" in Java parliance. Work-around attempt 1
was to call directly to the JDBC driver, but the cursor doesn't dispose of
the memory in the ResultSet once I've passed it by (OutOfMemoryError) and
the documentation suggests that cursor behavior is a little buggy for the
current postgres driver. (The docs suggest implementing a custom stored
procedure to provide iteration.) Attempt number 2, now underway, is to pass
LIMIT and OFFSET values to the query which Postgres handles quite
effectively as long as the OFFSET value is less than the total number of
rows in the table. When the value is greater than <num_rows>, the query
hangs for minutes.

So my question is, does Postgres keep any metadata around about un-indexed
table sizes? select count(*) from <my_table> itself can take a minute to
process. If I had ready access to that information, I could kluge up my code
with something like:

num_rows = getNumRows();
while( offset < num_rows ){
processData( select( offset += window_size, window_size ) )
}

At the moment the best option I have is to write a stored proceedure to
populate a table <mv_sizes>, but not only is this a pain in the patoot, it
just seems sick and wrong.

Am I missing something painfully obvious?

Jason
--
========================================================
Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics & Proteomics
Howard Hughes Medical Institute
========================================================
611 C.E. Young Drive East | Tel: (310) 206-3907
105 Boyer Hall, Box 951570 | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: jason@mbi.ucla.edu
========================================================
http://www.mbi.ucla.edu/~jason
========================================================

Tom Lane

tgl@sss.pgh.pa.us

almost 19 years ago

In reply to: Jason Nerothin (#1)

Re: large table problem

"Jason Nerothin" <jasonnerothin@gmail.com> writes:

Attempt number 2, now underway, is to pass
LIMIT and OFFSET values to the query which Postgres handles quite
effectively as long as the OFFSET value is less than the total number of
rows in the table. When the value is greater than <num_rows>, the query
hangs for minutes.

I don't actually believe the above; using successively larger offsets
should get slower and slower in a smooth manner, because the only thing
OFFSET does is throw away scanned rows just before they would have been
returned to the client. I think you've confused yourself somehow.

the documentation suggests that cursor behavior is a little buggy for the
current postgres driver.

How old a driver are you using? Because a cursor is definitely what you
want to use for retrieving millions of rows.

It strikes me that pgsql-jdbc might be a more suitable group of people
to ask about this than the -general list ...

regards, tom lane

Kris Jurka

books@ejurka.com

almost 19 years ago

In reply to: Jason Nerothin (#1)

Re: large table problem

On Fri, 20 Apr 2007, Jason Nerothin wrote:

I'm trying to work my way around a large query problem.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries
which are "unnamed native queries" in Java parliance. Work-around attempt 1
was to call directly to the JDBC driver, but the cursor doesn't dispose of
the memory in the ResultSet once I've passed it by (OutOfMemoryError) and
the documentation suggests that cursor behavior is a little buggy for the
current postgres driver. (The docs suggest implementing a custom stored
procedure to provide iteration.)

I'm not sure what documentation you're reading:

http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor

and it works as adverstised.

Kris Jurka

Jason Nerothin

jasonnerothin@gmail.com

almost 19 years ago

In reply to: Kris Jurka (#3)

Re: large table problem

Thanks for the redirect... After profiling my client memory usage and using
the built-in cursor functionality I discovered that another part of my
program was causing the memory overflow and that the ResultSet iteration was
doing exactly what it should have all along.

On 4/21/07, Kris Jurka <books@ejurka.com> wrote:

On Fri, 20 Apr 2007, Jason Nerothin wrote:

I'm trying to work my way around a large query problem.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the

queries

which are "unnamed native queries" in Java parliance. Work-around

attempt 1

was to call directly to the JDBC driver, but the cursor doesn't dispose

of

the memory in the ResultSet once I've passed it by (OutOfMemoryError)

and

the documentation suggests that cursor behavior is a little buggy for

the

current postgres driver. (The docs suggest implementing a custom stored
procedure to provide iteration.)

I'm not sure what documentation you're reading:

http://jdbc.postgresql.org/documentation/82/query.html#query-with-cursor

and it works as adverstised.

Kris Jurka

--
========================================================
Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics & Proteomics
Howard Hughes Medical Institute
========================================================
611 C.E. Young Drive East | Tel: (310) 206-3907
105 Boyer Hall, Box 951570 | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: jason@mbi.ucla.edu
========================================================
http://www.mbi.ucla.edu/~jason
========================================================