COPY FROM : out of memory

Started by Arnaud Lesauvageover 19 years ago14 messagesgeneral
Jump to latest
#1Arnaud Lesauvage
thewild@freesurf.fr

Hi list !

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

The table has no index, no trigger, ... :

CREATE TABLE tmp
(
c1 bigint,
c2 character varying,
c3 character varying
)
WITHOUT OIDS;
ALTER TABLE tmp OWNER TO postgres;

The COPY command is very basic :
SET client_encoding TO UTF8;
COPY tmp FROM
'E:\\Production\\Temp\\detailrechercheutf8.csv' CSV;

PostgreSQL version is :
"PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC
gcc.exe (GCC) 3.4.2 (mingw-special)"

I have ~1.5GB of RAM available, and ~4GB of free pagefile space.

Something wrong in my postgresql.conf ? I didn't do much
tweaking though...

Regards
--
Arnaud

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arnaud Lesauvage (#1)
Re: COPY FROM : out of memory

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

regards, tom lane

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#2)
Re: COPY FROM : out of memory

On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
État SQL :53200
Détail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

Maybe it's a line termination problem?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#4Terry Fielder
terry@ashtonwoodshomes.com
In reply to: Tom Lane (#2)
Re: COPY FROM : out of memory

Tom Lane wrote:

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

Unless its not seeing the end of the first record AS the end of the
first record, and hence seeing the whole file as 1 record.

Terry

Show quoted text

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

#5Arnaud Lesauvage
thewild@freesurf.fr
In reply to: Martijn van Oosterhout (#3)
Re: COPY FROM : out of memory

Martijn van Oosterhout a �crit :

On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

Maybe it's a line termination problem?

Have a nice day,

I think you are right !
Trying to see the first line with sed outputs the whole file!
All I did was export the file in UNICODE from MSSQL, convert
it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.

I guess I still don't have the right encoding... :(

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Arnaud Lesauvage (#5)
Re: COPY FROM : out of memory

Arnaud Lesauvage wrote:

Martijn van Oosterhout a �crit :

On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

Maybe it's a line termination problem?

I think you are right !
Trying to see the first line with sed outputs the whole file!
All I did was export the file in UNICODE from MSSQL, convert
it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.

I guess I still don't have the right encoding... :(

Did you set the encoding with \encoding? I think it's critical for
determining line and field separators. If you only do SET
client_encoding, the backend will work but psql may not.

Or you mean that the first line of the text file is the whole file? In
that case I'd guess that the iconv procedure is borked somehow, or maybe
the input file is OK for everything except the linefeed(*)

(*) is "linefeed" plural or do you need to add an "s"? Is the singular
"linefood"???

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#7Gopal
gopal@getmapping.com
In reply to: Terry Fielder (#4)
Postgres scalability and performance on windows

Hi all,

I have a postgres installation thats running under 70-80% CPU usage
while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

Here's the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

Table 1 - About 300,000 rows with simple rectangles

Table 2 - 1 million rows

Total size: 300MB

Functions : Simple coordinate reprojection and intersection query +
inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance
for queries under low loads is fast.

========================================================================
==========

postgresql.conf has following settings

max_connections = 150

hared_buffers = 20000 # min 16 or
max_connections*2, 8KB each

temp_buffers = 2000 # min 100, 8KB each

max_prepared_transactions = 25 # can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory

# per transaction slot, plus lock space (see max_locks_per_transaction).

work_mem = 512 # min 64, size in KB

#maintenance_work_mem = 16384 # min 1024, size in
KB

max_stack_depth = 2048

effective_cache_size = 82728 # typically 8KB each

random_page_cost = 4 # units are one
sequential page fetch

========================================================================
==========

SQL server caches all the data in memory which is making it faster(uses
about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each
process using about 10-30MB, not nearly enough to cache all the data and
ends up doing a lot of disk reads.

I've read that postgres depends on OS to cache the files, I wonder if
this is not happenning on windows.

In any case I cannot believe that having 15-20 processes running on
windows helps. Why not spwan of threads instead of processes, which
might

be far less expensive and more efficient. Is there any way of doing
this?

My question is, should I just accept the performance I am getting as the
limit on windows or should I be looking at some other params that I
might have missed?

Thanks,

Gopal

#8Thomas H.
me@alternize.com
In reply to: Gopal (#7)
Re: Postgres scalability and performance on windows

i'm seeing the opposite here on our win32 pgsql instance. while mssql server used ~50% cpu constantly (AND consuming lots of memory, pgsql runs at a low 1-5% and gives better performance even under relatively high load.

is the high cpu load coming from one particular postgres.exe process or is it distributed among all the processes?

try raising work_mem. we have set it to 30MB

- thomas
----- Original Message -----
From: Gopal
To: pgsql-general@postgresql.org
Sent: Thursday, November 23, 2006 11:31 PM
Subject: [GENERAL] Postgres scalability and performance on windows

Hi all,

I have a postgres installation thats running under 70-80% CPU usage while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

Here's the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

Table 1 - About 300,000 rows with simple rectangles

Table 2 - 1 million rows

Total size: 300MB

Functions : Simple coordinate reprojection and intersection query + inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance for queries under low loads is fast.

==================================================================================

postgresql.conf has following settings

max_connections = 150

hared_buffers = 20000 # min 16 or max_connections*2, 8KB each

temp_buffers = 2000 # min 100, 8KB each

max_prepared_transactions = 25 # can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared memory

# per transaction slot, plus lock space (see max_locks_per_transaction).

work_mem = 512 # min 64, size in KB

#maintenance_work_mem = 16384 # min 1024, size in KB

max_stack_depth = 2048

effective_cache_size = 82728 # typically 8KB each

random_page_cost = 4 # units are one sequential page fetch

==================================================================================

SQL server caches all the data in memory which is making it faster(uses about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each process using about 10-30MB, not nearly enough to cache all the data and ends up doing a lot of disk reads.

I've read that postgres depends on OS to cache the files, I wonder if this is not happenning on windows.

In any case I cannot believe that having 15-20 processes running on windows helps. Why not spwan of threads instead of processes, which might

be far less expensive and more efficient. Is there any way of doing this?

My question is, should I just accept the performance I am getting as the limit on windows or should I be looking at some other params that I might have missed?

Thanks,

Gopal

#9Bill Moran
wmoran@collaborativefusion.com
In reply to: Gopal (#7)
Re: Postgres scalability and performance on windows

On Thu, 23 Nov 2006 22:31:40 -0000
"Gopal" <gopal@getmapping.com> wrote:

Hi all,

I have a postgres installation thats running under 70-80% CPU usage
while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

Here's the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

Table 1 - About 300,000 rows with simple rectangles

Table 2 - 1 million rows

Total size: 300MB

Functions : Simple coordinate reprojection and intersection query +
inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance
for queries under low loads is fast.

========================================================================
==========

postgresql.conf has following settings

max_connections = 150

hared_buffers = 20000 # min 16 or
max_connections*2, 8KB each

Considering you have 4G or RAM, you might want to allocate more than 160M to
shared buffers.

temp_buffers = 2000 # min 100, 8KB each

max_prepared_transactions = 25 # can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory

# per transaction slot, plus lock space (see max_locks_per_transaction).

work_mem = 512 # min 64, size in KB

Again, with 4G of RAM, you may get some beneifit from more than 1/2M of
work space.

SQL server caches all the data in memory which is making it faster(uses
about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each
process using about 10-30MB, not nearly enough to cache all the data and
ends up doing a lot of disk reads.

Allocate more shared buffers and PG will use it.

I've read that postgres depends on OS to cache the files, I wonder if
this is not happenning on windows.

Yes, but it can access data even faster if it's in the shared buffer
space. There are numerous write-ups on the Internet about this sort
of tuning.

In any case I cannot believe that having 15-20 processes running on
windows helps. Why not spwan of threads instead of processes, which
might

be far less expensive and more efficient. Is there any way of doing
this?

Because every other OS (Linux, BSD, Solaris, etc) does very well with
multiple spawned processes. I expect that future versions of PG will
have some improvements to allow better performance on Windows, but you'll
be surprised how well it runs under a POSIX OS.

My question is, should I just accept the performance I am getting as the
limit on windows or should I be looking at some other params that I
might have missed?

I have a feeling that some tuning would improve things for you.

#10Merlin Moncure
mmoncure@gmail.com
In reply to: Gopal (#7)
Re: Postgres scalability and performance on windows

On 11/23/06, Gopal <gopal@getmapping.com> wrote:

I have a postgres installation thats running under 70-80% CPU usage while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

i somehow doubt ms sql server is 35x faster than postgresql in
production environments, even on windows.

work_mem = 512 # min 64,

this is probably too low.

SQL server caches all the data in memory which is making it faster(uses
about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each process
using about 10-30MB, not nearly enough to cache all the data and ends up
doing a lot of disk reads.

this is a misleading and unfortuante shortcoming of the windows
process manager. postgresql uses a lot of shared memory, and if you
have shared memory set to 10 mb, each process in the task manager can
report up to 10 mb (at the same time) even though only 10mb is really
in use.

I've read that postgres depends on OS to cache the files, I wonder if this
is not happenning on windows.

Are you suggesting postgresql somehow turned off file caching in windows?

In any case I cannot believe that having 15-20 processes running on windows
helps. Why not spwan of threads instead of processes, which might

this was an important arguement in oh, say, 1992 :-). Seriously, even
though processes are slower in windows than threads for certain
things, it's not as much as you'd expect and certainly not causing any
performance issues you are suffering.

My question is, should I just accept the performance I am getting as the

limit on windows or should I be looking at some other params that I might
have missed?

i'd start by logging queries with execution times and looking for
queries that are running the slowest.

merlin

#11Arnaud Lesauvage
thewild@freesurf.fr
In reply to: Alvaro Herrera (#6)
Re: COPY FROM : out of memory

Alvaro Herrera a �crit :

Arnaud Lesauvage wrote:

Martijn van Oosterhout a �crit :

On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:

Arnaud Lesauvage <thewild@freesurf.fr> writes:

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
�tat SQL :53200
D�tail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

Can you put together a self-contained example? The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

Maybe it's a line termination problem?

I think you are right !
Trying to see the first line with sed outputs the whole file!
All I did was export the file in UNICODE from MSSQL, convert
it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.

I guess I still don't have the right encoding... :(

Did you set the encoding with \encoding? I think it's critical for
determining line and field separators. If you only do SET
client_encoding, the backend will work but psql may not.

Or you mean that the first line of the text file is the whole file? In
that case I'd guess that the iconv procedure is borked somehow, or maybe
the input file is OK for everything except the linefeed(*)

No, I used "SET cleint_encoding".
But I checked the file with sed, and sed agrees with
PostgreSQL : there is just one line in the file.
I have a last idea. I'll give it a try today, if it doesn't
work I'll forget about this COPY stuff and work through ODBC.

--
Arnaud

#12Mark Cave-Ayland
mark.cave-ayland@ilande.co.uk
In reply to: Gopal (#7)
Re: Postgres scalability and performance on windows

Hi all,

I have a postgres installation thats running under 70-80% CPU usage
while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

Here's the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

Table 1 - About 300,000 rows with simple rectangles

Table 2 - 1 million rows

Total size: 300MB

Functions : Simple coordinate reprojection and intersection query +
inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance
for queries under low loads is fast.

(cut)

SQL server caches all the data in memory which is making it faster(uses
about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each
process using about 10-30MB, not nearly enough to cache all the data and
ends up doing a lot of disk reads.

I've read that postgres depends on OS to cache the files, I wonder if
this is not happenning on windows.

In any case I cannot believe that having 15-20 processes running on
windows helps. Why not spwan of threads instead of processes, which
might

be far less expensive and more efficient. Is there any way of doing
this?

Hi Gopal,

It sounds as if you are using PostGIS to store your geometries, and yes it
sounds as if something is not performing as it should. Please post your
configuration (along with information about the versions of PostGIS you
are using) to the postgis-users list at http://postgis.refractions.net.
You will also need to supply the output of EXPLAIN ANALYZE for some of
your queries in order to help determine exactly where the bottleneck is in
your application.

Kind regards,

Mark.

#13Gopal
gopal@getmapping.com
In reply to: Mark Cave-Ayland (#12)
Re: Postgres scalability and performance on windows

Hi,

Thanks for your suggestions. Here's an output of the explain analyse.
I'll change the shared_buffers and look at the behaviour again.

"Limit (cost=59.53..59.53 rows=1 width=28) (actual time=15.681..15.681
rows=1 loops=1)"
" -> Sort (cost=59.53..59.53 rows=1 width=28) (actual
time=15.678..15.678 rows=1 loops=1)"
" Sort Key: ceil(subq.percentcover), (1::double precision /
ds.maxresolution)"
" -> Hash Join (cost=58.19..59.52 rows=1 width=28) (actual
time=15.630..15.663 rows=2 loops=1)"
" Hash Cond: ("outer".datasetid = "inner".did)"
" -> Seq Scan on tbl_metadata_dataset ds (cost=0.00..1.21
rows=21 width=24) (actual time=0.006..0.021 rows=21 loops=1)"
" -> Hash (cost=58.18..58.18 rows=1 width=12) (actual
time=15.591..15.591 rows=2 loops=1)"
" -> Sort (cost=58.17..58.17 rows=1 width=117)
(actual time=15.585..15.586 rows=2 loops=1)"
" Sort Key: tbl_metadata_chunks.datasetid"
" -> HashAggregate (cost=58.13..58.16 rows=1
width=117) (actual time=15.572..15.573 rows=2 loops=1)"
" -> Hash IN Join (cost=3.34..58.10
rows=7 width=117) (actual time=0.261..0.544 rows=50 loops=1)"
" Hash Cond: ("outer".datasetid =
"inner".datasetid)"
" -> Bitmap Heap Scan on
tbl_metadata_chunks (cost=2.05..56.67 rows=14 width=117) (actual
time=0.204..0.384 rows=60 loops=1)"
" Filter: (chunkgeometry &&
'0103000020E6100000010000000500000058631EDF87ECC1BF608F3D1911694940A0958
A8763C9C1BF535069BA846C494026B5F1284FABB8BFAB1577356E6C494094E1170D33F3B
8BF7700CC99FA68494058631EDF87ECC1BF608F3D1 (..)"
" -> Bitmap Index Scan on
tbl_metadata_chunks_idx3 (cost=0.00..2.05 rows=14 width=0) (actual
time=0.192..0.192 rows=60 loops=1)"
" Index Cond:
(chunkgeometry &&
'0103000020E6100000010000000500000058631EDF87ECC1BF608F3D1911694940A0958
A8763C9C1BF535069BA846C494026B5F1284FABB8BFAB1577356E6C494094E1170D33F3B
8BF7700CC99FA68494058631EDF87ECC (..)"
" -> Hash (cost=1.26..1.26
rows=10 width=4) (actual time=0.037..0.037 rows=10 loops=1)"
" -> Seq Scan on
tbl_metadata_dataset (cost=0.00..1.26 rows=10 width=4) (actual
time=0.005..0.024 rows=10 loops=1)"
" Filter: (typeofdataid
= 1)"
"Total runtime: 15.871 ms"

Gopal

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gopal (#13)
Re: Postgres scalability and performance on windows

"Gopal" <gopal@getmapping.com> writes:

Thanks for your suggestions. Here's an output of the explain analyse.

What's the query exactly, and what are the schemas of the tables it
uses (psql \d descriptions would do)?

The actual runtime seems to be almost all spent in the hash aggregation
step:

-> HashAggregate (cost=58.13..58.16 rows=1 width=117) (actual time=15.572..15.573 rows=2 loops=1)
-> Hash IN Join (cost=3.34..58.10 rows=7 width=117) (actual time=0.261..0.544 rows=50 loops=1)

15 msec seems like a long time to aggregate only 50 rows, so I'm
wondering what aggregates are being calculated and over what
datatypes...

regards, tom lane