Large Result and Memory Limit

Started by Mike Ginsburgover 18 years ago13 messagesgeneral
Jump to latest
#1Mike Ginsburg
mginsburg@collaborativefusion.com

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will
create a CSV from multiple tables. I have managed to keep the script
(PHP) under the memory limit when creating and inserting the CSV into
the database. The problem comes when I try to query for the data and
export it. Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.

I've thought about storing the data in multiple rows and then querying
one-by-one and outputting, but was hoping there was a better way.

Thanks in advance for the help.

MG

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015

#2André Volpato
andre.volpato@ecomtecnologia.com.br
In reply to: Mike Ginsburg (#1)
Re: Large Result and Memory Limit

Mike Ginsburg escreveu:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will
create a CSV from multiple tables. I have managed to keep the script
(PHP) under the memory limit

okay... some info needed here.
1. memory on the DB server
2. memory_limit on php.ini

when creating and inserting the CSV into the database. The problem
comes when I try to query for the data and export it. Memory limit is
a major concern, but the query for one row returns a result set too
large and PHP fails.

a single row is enough to crash PHP ?

I've thought about storing the data in multiple rows and then querying
one-by-one and outputting, but was hoping there was a better way.

if you can´t raise memory_limit, I think it´s the only way.

[]´s
ACV

#3Mike Ginsburg
mginsburg@collaborativefusion.com
In reply to: André Volpato (#2)
Re: Large Result and Memory Limit

Andr� Volpato wrote:

Mike Ginsburg escreveu:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will
create a CSV from multiple tables. I have managed to keep the script
(PHP) under the memory limit

okay... some info needed here.
1. memory on the DB server
2. memory_limit on php.ini

PHP Memory Limit is 16M. We're running multiple installations on a
single webserver, so memory is a concern
DB Server is separate from the webserver.

when creating and inserting the CSV into the database. The problem
comes when I try to query for the data and export it. Memory limit
is a major concern, but the query for one row returns a result set
too large and PHP fails.

a single row is enough to crash PHP ?

Well the "data" field in the table (text) contains 50K lines. It's over
30M in size for the full export.

I've thought about storing the data in multiple rows and then
querying one-by-one and outputting, but was hoping there was a better
way.

if you can�t raise memory_limit, I think it�s the only way.

I was afraid that would be the answer.

[]�s
ACV

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mike Ginsburg (#1)
Re: Large Result and Memory Limit

Mike Ginsburg wrote:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will create a
CSV from multiple tables. I have managed to keep the script (PHP) under
the memory limit when creating and inserting the CSV into the database.
The problem comes when I try to query for the data and export it. Memory
limit is a major concern, but the query for one row returns a result set
too large and PHP fails.

One row? Wow, I didn't know PHP was that broken.

Try declaring a cursor and fetching a few rows at a time.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"No es bueno caminar con un hombre muerto"

#5André Volpato
andre.volpato@ecomtecnologia.com.br
In reply to: Mike Ginsburg (#3)
Re: Large Result and Memory Limit

Mike Ginsburg escreveu:

André Volpato wrote:

Mike Ginsburg escreveu:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will
create a CSV from multiple tables. I have managed to keep the
script (PHP) under the memory limit

okay... some info needed here.
1. memory on the DB server
2. memory_limit on php.ini

PHP Memory Limit is 16M. We're running multiple installations on a
single webserver, so memory is a concern
DB Server is separate from the webserver.

when creating and inserting the CSV into the database. The problem
comes when I try to query for the data and export it. Memory limit
is a major concern, but the query for one row returns a result set
too large and PHP fails.

a single row is enough to crash PHP ?

Well the "data" field in the table (text) contains 50K lines. It's
over 30M in size for the full export.

I've thought about storing the data in multiple rows and then
querying one-by-one and outputting, but was hoping there was a
better way.

if you can´t raise memory_limit, I think it´s the only way.

I was afraid that would be the answer.

Well, you could try to retrieve data with substr(), say 10k lines in 5
queries.
It´s kinda ugly, but should work in this case...

[]´s
ACV

#6André Volpato
andre.volpato@ecomtecnologia.com.br
In reply to: Alvaro Herrera (#4)
Re: Large Result and Memory Limit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Alvaro Herrera escreveu:
<blockquote cite="mid:20071004200537.GB28896@alvh.no-ip.org" type="cite">
<pre wrap="">Mike Ginsburg wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will create a
CSV from multiple tables. I have managed to keep the script (PHP) under
the memory limit when creating and inserting the CSV into the database.
The problem comes when I try to query for the data and export it. Memory
limit is a major concern, but the query for one row returns a result set
too large and PHP fails.
</pre>
</blockquote>
<pre wrap=""><!---->
One row? Wow, I didn't know PHP was that broken.

Try declaring a cursor and fetching a few rows at a time.</pre>
</blockquote>
PHP is just respecting memory_limit when retrieving data. <br>
In this case, a single row is about 30M, a lot more than the limit of
16M.<br>
I think cursors wouldn&acute;t help anyway.<br>
<br>
[]&acute;s,<br>
ACV<br>
<br>
</body>
</html>

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Alvaro Herrera (#4)
Re: Large Result and Memory Limit

On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Mike Ginsburg wrote:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will create a
CSV from multiple tables. I have managed to keep the script (PHP) under
the memory limit when creating and inserting the CSV into the database.
The problem comes when I try to query for the data and export it. Memory
limit is a major concern, but the query for one row returns a result set
too large and PHP fails.

One row? Wow, I didn't know PHP was that broken.

No, it's not php, it's his approach that's broken.

He was saying that a single db row has a text column with 50,000 lines.

The fact that php throws and error and stop rather than running your
machine out of memory would hardly qualify as broken. It's got a
memory limit for a reason. Even Java I believe has one.

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Scott Marlowe (#7)
Re: Large Result and Memory Limit

Scott Marlowe escribi�:

On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Mike Ginsburg wrote:

Hello,
I am working on a personnel registry that has upwards of 50,000
registrants. Currently I am working on an export module that will create a
CSV from multiple tables. I have managed to keep the script (PHP) under
the memory limit when creating and inserting the CSV into the database.
The problem comes when I try to query for the data and export it. Memory
limit is a major concern, but the query for one row returns a result set
too large and PHP fails.

One row? Wow, I didn't know PHP was that broken.

No, it's not php, it's his approach that's broken.

Oh, I didn't realize that's what he was saying.

Of course, you can get the thing out of the database using substring()
but really this is a matter of really poor design.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"No hay cielo posible sin hundir nuestras ra�ces
en la profundidad de la tierra" (Malucha Pinto)

#9Scott Marlowe
scott.marlowe@gmail.com
In reply to: Mike Ginsburg (#1)
Re: Large Result and Memory Limit

On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

export it. Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.

If this is true, and one single db row makes php exceed its memory
limit just by returning it, you've done something very very in your
design.

You'd proably be better served using either a plain text file system
to store these things, or large objects in postgresql.

But if you're stuffing ~8 megs worth of csv text data into a single
row* you're probably not using a very relational layout of your data.
And you're losing all the advantages (checking your data for
consistency and such) that a relational db could give you.

* Note that I'm assuming a few things up there. 1: php uses about 2:1
memory to store data it's holding, roughly. If you're set to 16 Meg
max, I'm assuming your return set is 8Meg or larger.

#10Mike Ginsburg
mginsburg@collaborativefusion.com
In reply to: Scott Marlowe (#9)
Re: Large Result and Memory Limit

Scott Marlowe wrote:

On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

export it. Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.

If this is true, and one single db row makes php exceed its memory
limit just by returning it, you've done something very very in your
design.

This is for the export only. Since it is an export of ~50,000
registrants, it takes some time to process. We also have load balanced
web servers, so unless I want to create identical processes on all
webservers, or write some crazy script to scp it across the board,
storing it as a text file is not an option. I realize that my way of
doing it is flawed, which the reason I came here for advice. The CSV
contains data from approximately 15 tables, several of which are
many-to-ones making joins a little tricky. My thought was to do all of
the processing in the background, store the results in the DB, and
allowing the requester to download it at their convenience.

Would it be a good idea to create a temporary table that stored all of
the export data in it broken out by rows and columns, and when download
time comes, query from their?

You'd proably be better served using either a plain text file system
to store these things, or large objects in postgresql.

But if you're stuffing ~8 megs worth of csv text data into a single
row* you're probably not using a very relational layout of your data.
And you're losing all the advantages (checking your data for
consistency and such) that a relational db could give you.

* Note that I'm assuming a few things up there. 1: php uses about 2:1
memory to store data it's holding, roughly. If you're set to 16 Meg
max, I'm assuming your return set is 8Meg or larger.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Mike Ginsburg
Collaborative Fusion, Inc.
mginsburg@collaborativefusion.com
412-422-3463 x4015

#11Scott Marlowe
scott.marlowe@gmail.com
In reply to: Mike Ginsburg (#10)
Re: Large Result and Memory Limit

On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

This is for the export only. Since it is an export of ~50,000 registrants,
it takes some time to process. We also have load balanced web servers, so
unless I want to create identical processes on all webservers, or write some
crazy script to scp it across the board, storing it as a text file is not an
option. I realize that my way of doing it is flawed, which the reason I
came here for advice. The CSV contains data from approximately 15 tables,
several of which are many-to-ones making joins a little tricky. My thought
was to do all of the processing in the background, store the results in the
DB, and allowing the requester to download it at their convenience.

Would it be a good idea to create a temporary table that stored all of the
export data in it broken out by rows and columns, and when download time
comes, query from their?

Yeah, I tend to think that would be better. Then you could use a
cursor to retrieve then and serve them one line at a time and not have
to worry about overloading your php server.

#12Bill Moran
wmoran@potentialtech.com
In reply to: Scott Marlowe (#9)
Re: Large Result and Memory Limit

In response to "Scott Marlowe" <scott.marlowe@gmail.com>:

On 10/4/07, Mike Ginsburg <mginsburg@collaborativefusion.com> wrote:

export it. Memory limit is a major concern, but the query for one row
returns a result set too large and PHP fails.

If this is true, and one single db row makes php exceed its memory
limit just by returning it, you've done something very very in your
design.

I work with Mike, and I'm not sure there's something "very wrong" in
the design.

The problem we're looking at is an export. This big TEXT field is
nothing more than a staging area. The export itself can take several
hours to run, so we background the process, and store the result in
a big TEXT field (it ends up being a CSV). Later, the user can log
in and download the exported file via the web interface. (But it's
unlikely that anyone is going to click "export", then wait 2 hours
for their browser to refresh :)

The process is, web interface -> request export -> get an estimate
on completion time -> come back later and check the status -> download
if complete.

It's difficult (maybe impossible) to materialize the data on a schedule,
since the user has a number of options how how to export the data
(filters, etc) so we chose to do it on-demand.

You'd proably be better served using either a plain text file system
to store these things, or large objects in postgresql.

We have multiple web servers with a load balancer, so saving the
result to a file doesn't really work. We could put shared storage
in place, but I'm still not convinced that's the best fix for this.

Large objects are one option we were considering. The problem is
they don't work with Slony, and our redundancy relies on Slony. Granted,
we might want to just make this a non-replicated table, since it's
only a staging area anyway, but we're looking for a better solution.

My (specific) question is whether or not anyone has experimented with
putting something like a CSV file in a table with one tuple for each
row, and compared performance, etc to putting it in a large object?

But if you're stuffing ~8 megs worth of csv text data into a single
row* you're probably not using a very relational layout of your data.
And you're losing all the advantages (checking your data for
consistency and such) that a relational db could give you.

Actually, we _are_ using lots of tables with lots of relations and
foreign keys and triggers and stored procedures and all sorts of other
stuff. That's why it takes multiple hours to "flatten" everything (is
there a better term to describe the process of turning relational data
into a single flat file?)

* Note that I'm assuming a few things up there. 1: php uses about 2:1
memory to store data it's holding, roughly. If you're set to 16 Meg
max, I'm assuming your return set is 8Meg or larger.

I'm not sure, but that's probably correct.

--
Bill Moran
http://www.potentialtech.com

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Bill Moran (#12)
Re: Large Result and Memory Limit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill Moran wrote:

In response to "Scott Marlowe" <scott.marlowe@gmail.com>:

The process is, web interface -> request export -> get an estimate
on completion time -> come back later and check the status -> download
if complete.

It's difficult (maybe impossible) to materialize the data on a schedule,
since the user has a number of options how how to export the data
(filters, etc) so we chose to do it on-demand.

Export to large object, not text.

Joshua D. Drake

- --

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBVbBATb/zqfZUUQRAivqAKCSiEcdssuENH/MBZy8L/SV7jP2igCdFVb5
VEevi+031549JN8WvXrHFnI=
=o5hR
-----END PGP SIGNATURE-----