Query memory usage
Hi Everyone,
I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:
COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;
Executing this query causes our server to consume all available swap
and crash. Can anyone help me figure out what needs to be done to
allow this query to execute? How long it takes doesn't really matter
as long as it can be performed reliably. The database currently lives
on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf:
shared_buffers = 4GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB
Tom
Tom Duffey wrote:
Hi Everyone,
I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;
Executing this query causes our server to consume all available swap
and crash. Can anyone help me figure out what needs to be done to
allow this query to execute? How long it takes doesn't really matter
as long as it can be performed reliably. The database currently lives
on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf:shared_buffers = 4GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 8GBTom
Is there an index on the time stamp? If not, create one - it will make
possible the select without having to read the entire table (only the index)
-- Karl
Tom Duffey <tduffey@trillitech.com> writes:
I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:
COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;
Executing this query causes our server to consume all available swap
and crash.
What's being done on the client side with the data? AFAIK that
operation really shouldn't consume a lot of memory on the server side.
It would help if you'd be more specific about which process is consuming
swap space.
regards, tom lane
On May 15, 2010, at 4:51 PM, Tom Lane wrote:
Tom Duffey <tduffey@trillitech.com> writes:
I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk
no
problem. However, I would like to retrieve a large subset of data
from this table using something like:COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO
STDOUT;Executing this query causes our server to consume all available swap
and crash.What's being done on the client side with the data? AFAIK that
operation really shouldn't consume a lot of memory on the server side.
It would help if you'd be more specific about which process is
consuming
swap space.
I am executing the query in psql at the command line and piping the
result to a file, e.g.,
psql < get_data.sql > data.sql
Tom
Tom Duffey <tduffey@trillitech.com> writes:
On May 15, 2010, at 4:51 PM, Tom Lane wrote:
What's being done on the client side with the data?
I am executing the query in psql at the command line and piping the
result to a file, e.g.,
psql < get_data.sql > data.sql
Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.
Could we see the full schema (eg via psql \dt) for the table being
copied?
regards, tom lane
On May 15, 2010, at 7:28 PM, Tom Lane wrote:
Tom Duffey <tduffey@trillitech.com> writes:
On May 15, 2010, at 4:51 PM, Tom Lane wrote:
What's being done on the client side with the data?
I am executing the query in psql at the command line and piping the
result to a file, e.g.,
psql < get_data.sql > data.sqlWell, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.Could we see the full schema (eg via psql \dt) for the table being
copied?
I hope you are right! The actual query is different because I was not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:
SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';
And here is the table schema:
prod=> \dt point_history
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------------
public | point_history | table | prod
(1 row)
prod=> \d point_history
Table "public.point_history"
Column | Type | Modifiers
-----------+-----------------------------+-----------
point_id | integer | not null
value | real | not null
status | integer | not null
timestamp | timestamp without time zone | not null
Indexes:
"point_history_pkey" PRIMARY KEY, btree (point_id, "timestamp")
Foreign-key constraints:
"$1" FOREIGN KEY (point_id) REFERENCES point(id)
Tom
Tom Duffey <tduffey@trillitech.com> writes:
On May 15, 2010, at 7:28 PM, Tom Lane wrote:
Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.
I hope you are right! The actual query is different because I was not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:
SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';
Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
STDOUT? The former case will cause psql to eat memory, because it tries
to buffer the whole result of an ordinary query. In the latter case
psql will just stream the data through to the output file.
regards, tom lane
On May 15, 2010, at 8:00 PM, Tom Lane wrote:
Tom Duffey <tduffey@trillitech.com> writes:
On May 15, 2010, at 7:28 PM, Tom Lane wrote:
Well, I tried executing a large "copy (select ...)" query and
couldn't
see any memory bloat at all in either the backend or psql. So
there's
something relevant that you haven't told us.I hope you are right! The actual query is different because I was
not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
STDOUT? The former case will cause psql to eat memory, because it
tries
to buffer the whole result of an ordinary query. In the latter case
psql will just stream the data through to the output file.
That's the whole query. If I understand your reply correctly it
sounds like psql was the culprit and that I should try again using
COPY (...) TO STDOUT, no?
Tom