Query memory usage

Started by Tom Duffeyalmost 16 years ago8 messagesgeneral
Jump to latest
#1Tom Duffey
tduffey@trillitech.com

Hi Everyone,

I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:

COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;

Executing this query causes our server to consume all available swap
and crash. Can anyone help me figure out what needs to be done to
allow this query to execute? How long it takes doesn't really matter
as long as it can be performed reliably. The database currently lives
on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf:

shared_buffers = 4GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB

Tom

#2Karl Denninger
karl@denninger.net
In reply to: Tom Duffey (#1)
Re: Query memory usage

Tom Duffey wrote:

Hi Everyone,

I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:

COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;

Executing this query causes our server to consume all available swap
and crash. Can anyone help me figure out what needs to be done to
allow this query to execute? How long it takes doesn't really matter
as long as it can be performed reliably. The database currently lives
on a Red Hat EL 5.3 server with 16GB RAM and 4GB swap running
PostgreSQL 8.3.7. Possibly relevant lines from postgresql.conf:

shared_buffers = 4GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB

Tom

Is there an index on the time stamp? If not, create one - it will make
possible the select without having to read the entire table (only the index)

-- Karl

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Duffey (#1)
Re: Query memory usage

Tom Duffey <tduffey@trillitech.com> writes:

I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk no
problem. However, I would like to retrieve a large subset of data
from this table using something like:

COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO STDOUT;

Executing this query causes our server to consume all available swap
and crash.

What's being done on the client side with the data? AFAIK that
operation really shouldn't consume a lot of memory on the server side.
It would help if you'd be more specific about which process is consuming
swap space.

regards, tom lane

#4Tom Duffey
tduffey@trillitech.com
In reply to: Tom Lane (#3)
Re: Query memory usage

On May 15, 2010, at 4:51 PM, Tom Lane wrote:

Tom Duffey <tduffey@trillitech.com> writes:

I have a table with several hundred million rows of timestamped
values. Using pg_dump we are able to dump the entire table to disk
no
problem. However, I would like to retrieve a large subset of data
from this table using something like:

COPY (SELECT * FROM history WHERE timestamp > '2009-01-01') TO
STDOUT;

Executing this query causes our server to consume all available swap
and crash.

What's being done on the client side with the data? AFAIK that
operation really shouldn't consume a lot of memory on the server side.
It would help if you'd be more specific about which process is
consuming
swap space.

I am executing the query in psql at the command line and piping the
result to a file, e.g.,

psql < get_data.sql > data.sql

Tom

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Duffey (#4)
Re: Query memory usage

Tom Duffey <tduffey@trillitech.com> writes:

On May 15, 2010, at 4:51 PM, Tom Lane wrote:

What's being done on the client side with the data?

I am executing the query in psql at the command line and piping the
result to a file, e.g.,
psql < get_data.sql > data.sql

Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.

Could we see the full schema (eg via psql \dt) for the table being
copied?

regards, tom lane

#6Tom Duffey
tduffey@trillitech.com
In reply to: Tom Lane (#5)
Re: Query memory usage

On May 15, 2010, at 7:28 PM, Tom Lane wrote:

Tom Duffey <tduffey@trillitech.com> writes:

On May 15, 2010, at 4:51 PM, Tom Lane wrote:

What's being done on the client side with the data?

I am executing the query in psql at the command line and piping the
result to a file, e.g.,
psql < get_data.sql > data.sql

Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.

Could we see the full schema (eg via psql \dt) for the table being
copied?

I hope you are right! The actual query is different because I was not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:

SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';

And here is the table schema:

prod=> \dt point_history
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------------
public | point_history | table | prod
(1 row)

prod=> \d point_history
Table "public.point_history"
Column | Type | Modifiers
-----------+-----------------------------+-----------
point_id | integer | not null
value | real | not null
status | integer | not null
timestamp | timestamp without time zone | not null
Indexes:
"point_history_pkey" PRIMARY KEY, btree (point_id, "timestamp")
Foreign-key constraints:
"$1" FOREIGN KEY (point_id) REFERENCES point(id)

Tom

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Duffey (#6)
Re: Query memory usage

Tom Duffey <tduffey@trillitech.com> writes:

On May 15, 2010, at 7:28 PM, Tom Lane wrote:

Well, I tried executing a large "copy (select ...)" query and couldn't
see any memory bloat at all in either the backend or psql. So there's
something relevant that you haven't told us.

I hope you are right! The actual query is different because I was not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:

SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';

Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
STDOUT? The former case will cause psql to eat memory, because it tries
to buffer the whole result of an ordinary query. In the latter case
psql will just stream the data through to the output file.

regards, tom lane

#8Tom Duffey
tduffey@trillitech.com
In reply to: Tom Lane (#7)
Re: Query memory usage

On May 15, 2010, at 8:00 PM, Tom Lane wrote:

Tom Duffey <tduffey@trillitech.com> writes:

On May 15, 2010, at 7:28 PM, Tom Lane wrote:

Well, I tried executing a large "copy (select ...)" query and
couldn't
see any memory bloat at all in either the backend or psql. So
there's
something relevant that you haven't told us.

I hope you are right! The actual query is different because I was
not
aware until right before I posted this question that you can have a
WHERE clause with COPY. Here is the actual query I ran:

SELECT point_id || E'\t' || status || E'\t' || value || E'\t' ||
timestamp
FROM point_history
WHERE timestamp > NOW() - interval '18 months';

Ermm ... is that the whole query, or did you wrap it in COPY (...) TO
STDOUT? The former case will cause psql to eat memory, because it
tries
to buffer the whole result of an ordinary query. In the latter case
psql will just stream the data through to the output file.

That's the whole query. If I understand your reply correctly it
sounds like psql was the culprit and that I should try again using
COPY (...) TO STDOUT, no?

Tom