psql is slow and it does not take much resources

Started by Javier de la Torrealmost 20 years ago19 messagesgeneral
Jump to latest
#1Javier de la Torre
jatorre@gmail.com

Hi all,

I've been searching around for an answer to this, but I coulnd't find
anything. So here we go.

I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
of RAM and lot of free HD space.

I have a very large dump file, more then 4GB, to recreate a database.
When I run:

psql -Uuser -q -dmydatabase <mydump.sql &

It works, but is slow, it inserts about 1.000 record in 10 seconds.
But the curious thing is that if I check the resources of the server
it seems that psql is not using them much. "top" says that postgres is
using around 2% of the processor and psql around 1% so that server is
only 3% used. "top" also says that more or less 48% idle and
50%waiting...

In the other hand the memory seems to be very high...

What could be the problem? Is that the server is just slow reading
from the file? It has a 300GB SCSI drive.

Is there any way to make this work faster? I would like to recreate
the db at least once a month, but this taked too much time...

Thanks in advance for any advice.

Javier.

#2Larry Rosenman
ler@lerctr.org
In reply to: Javier de la Torre (#1)
Re: psql is slow and it does not take much resources

Javier de la Torre wrote:

Hi all,

I've been searching around for an answer to this, but I coulnd't find
anything. So here we go.

I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
of RAM and lot of free HD space.

I have a very large dump file, more then 4GB, to recreate a database.
When I run:

psql -Uuser -q -dmydatabase <mydump.sql &

It works, but is slow, it inserts about 1.000 record in 10 seconds.
But the curious thing is that if I check the resources of the server
it seems that psql is not using them much. "top" says that postgres is
using around 2% of the processor and psql around 1% so that server is
only 3% used. "top" also says that more or less 48% idle and
50%waiting...

In the other hand the memory seems to be very high...

What could be the problem? Is that the server is just slow reading
from the file? It has a 300GB SCSI drive.

Is there any way to make this work faster? I would like to recreate
the db at least once a month, but this taked too much time...

Thanks in advance for any advice.

Javier.

How is the MyDump created? is it inserts? Or copy's?

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#3Javier de la Torre
jatorre@gmail.com
In reply to: Larry Rosenman (#2)
Re: psql is slow and it does not take much resources

It is inserts.

I create the inserts myself with a Python programmed I hace created to
migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Thanks.

Javier.

Show quoted text

On 5/3/06, Larry Rosenman <lrosenman@pervasive.com> wrote:

Javier de la Torre wrote:

Hi all,

I've been searching around for an answer to this, but I coulnd't find
anything. So here we go.

I am running PostgreSQL 8.1.3 on Red Hat on an Intel server with 2GB
of RAM and lot of free HD space.

I have a very large dump file, more then 4GB, to recreate a database.
When I run:

psql -Uuser -q -dmydatabase <mydump.sql &

It works, but is slow, it inserts about 1.000 record in 10 seconds.
But the curious thing is that if I check the resources of the server
it seems that psql is not using them much. "top" says that postgres is
using around 2% of the processor and psql around 1% so that server is
only 3% used. "top" also says that more or less 48% idle and
50%waiting...

In the other hand the memory seems to be very high...

What could be the problem? Is that the server is just slow reading
from the file? It has a 300GB SCSI drive.

Is there any way to make this work faster? I would like to recreate
the db at least once a month, but this taked too much time...

Thanks in advance for any advice.

Javier.

How is the MyDump created? is it inserts? Or copy's?

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#4Larry Rosenman
ler@lerctr.org
In reply to: Javier de la Torre (#3)
Re: psql is slow and it does not take much resources

Javier de la Torre wrote:

It is inserts.

I create the inserts myself with a Python programmed I hace created to
migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Ok, that makes *EACH* insert a transaction, with all the overhead.

You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
yet
set it up as a COPY.

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#5Leif B. Kristensen
leif@solumslekt.org
In reply to: Larry Rosenman (#4)
Re: psql is slow and it does not take much resources

On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:

Javier de la Torre wrote:

It is inserts.

I create the inserts myself with a Python programmed I hace created
to migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Ok, that makes *EACH* insert a transaction, with all the overhead.

You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
yet set it up as a COPY.

I'm using essentially the same approach for my custom backup/restore
procedure. I also found it a very slow process. But when I wrapped up
each table script (ie. 20-30k of INSERTs) the time it took to populate
the entire database went down from about half an hour to 50 seconds.
Very impressive ;-)

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE

#6Javier de la Torre
jatorre@gmail.com
In reply to: Leif B. Kristensen (#5)
Re: psql is slow and it does not take much resources

Yes,

Thanks. I am doing this now...

Is definetly faster, but I will also discover now if there is a limit
in a transaction side... I am going to try to insert into one single
transaction 60 million records in a table.

In any case I still don't understand how why PostgreSQL was not taking
resources before without the transaction. If it has to create a
transaction per insert I understand it will have to do more things,
but why is not taking all resources from the machine? I mean, why is
it only taking 3% of them.

Javier.

Show quoted text

On 5/3/06, Leif B. Kristensen <leif@solumslekt.org> wrote:

On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:

Javier de la Torre wrote:

It is inserts.

I create the inserts myself with a Python programmed I hace created
to migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Ok, that makes *EACH* insert a transaction, with all the overhead.

You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
yet set it up as a COPY.

I'm using essentially the same approach for my custom backup/restore
procedure. I also found it a very slow process. But when I wrapped up
each table script (ie. 20-30k of INSERTs) the time it took to populate
the entire database went down from about half an hour to 50 seconds.
Very impressive ;-)

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#7Larry Rosenman
ler@lerctr.org
In reply to: Javier de la Torre (#6)
Re: psql is slow and it does not take much resources

Javier de la Torre wrote:

Yes,

Thanks. I am doing this now...

Is definetly faster, but I will also discover now if there is a limit
in a transaction side... I am going to try to insert into one single
transaction 60 million records in a table.

In any case I still don't understand how why PostgreSQL was not taking
resources before without the transaction. If it has to create a
transaction per insert I understand it will have to do more things,
but why is not taking all resources from the machine? I mean, why is
it only taking 3% of them.

I'll bet your WAL disk is mostly WAIT-I/O, waiting for the WAL log
flushes at end of transaction.

LER

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry.Rosenman@pervasive.com
Web: www.pervasive.com

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Leif B. Kristensen (#5)
Re: psql is slow and it does not take much resources

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?

There's a limit of (I think) 2-4 billion commands per transaction. Each
command can insert any number of tuples.

So if you're doing one tuple per command that limits you to a few
billion inserts per transaction. Ofcourse, COPY is always faster
still...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#9Javier de la Torre
jatorre@gmail.com
In reply to: Martijn van Oosterhout (#8)
Re: psql is slow and it does not take much resources

Great! Then there will be no problems.

I would use COPY but I think I can not. While moving from MySQL to
PostgreSQL I am also transforming a pair of fields, latitude,
longitude, into a geometry field, POINT, that is understood for
Potgis. I though I will not be able to use COPY when inserting data
with functions.

Thanks again all.

Javier.

Show quoted text

On 5/3/06, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?

There's a limit of (I think) 2-4 billion commands per transaction. Each
command can insert any number of tuples.

So if you're doing one tuple per command that limits you to a few
billion inserts per transaction. Ofcourse, COPY is always faster
still...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

From each according to his ability. To each according to his ability to litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEWMwgIB7bNG8LQkwRAnvUAJ9YlsyGDInXKwFhsViFTJXvnUmd9ACeO5Al
LLqOvjBshH9VXfR1SaBHMYE=
=itek
-----END PGP SIGNATURE-----

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#8)
Re: psql is slow and it does not take much resources

Martijn van Oosterhout <kleptog@svana.org> writes:

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?

There's a limit of (I think) 2-4 billion commands per transaction. Each
command can insert any number of tuples.

So if you're doing one tuple per command that limits you to a few
billion inserts per transaction. Ofcourse, COPY is always faster
still...

If you have any deferred triggers (including foreign keys) on the table
then the practical limit is likely to be a lot less, say in the few
millions. However, in a bulk data load situation you probably don't
need to have such triggers; it's better to establish the constraint
after you load the data.

BTW, has the OP read
http://developer.postgresql.org/docs/postgres/populate.html
? Much of this thread seems to be rehashing that page ...

regards, tom lane

#11Joe Healy
joe@omc-international.com.au
In reply to: Javier de la Torre (#9)
Re: psql is slow and it does not take much resources

Javier de la Torre wrote:

Great! Then there will be no problems.

I would use COPY but I think I can not. While moving from MySQL to
PostgreSQL I am also transforming a pair of fields, latitude,
longitude, into a geometry field, POINT, that is understood for
Potgis. I though I will not be able to use COPY when inserting data
with functions.

I definitely recommend using copy if you are inserting a large amount of
data into postgis.
we use something like the following python code to read from a csv file
and insert into pgsql.
I can't remember the rate it works at but it was much quicker than
anything else we tried.

def insertData( header, delimiter, filename, table, SRID, dbname, user,
host ):

f = open(filename, 'r')

# Open a new process to enter data (~10x faster than psycopg)
process = os.popen('psql %s %s -c "COPY %s (geom, elevation) from
stdin;" -h %s' % (dbname, user, table, host), "w")

for a in f:
unpackline = a[:-1].split(delimiter)
easting, northing, elevation = unpackline
process.write("SRID=%s;POINT(%s %s)\t%s\n" %( SRID, easting,
northing, elevation))

f.close()
process.close()

Hope that helps,

Joe

#12Alban Hertroys
alban@magproductions.nl
In reply to: Javier de la Torre (#9)
Re: psql is slow and it does not take much resources

Javier de la Torre wrote:

Great! Then there will be no problems.

I would use COPY but I think I can not. While moving from MySQL to
PostgreSQL I am also transforming a pair of fields, latitude,
longitude, into a geometry field, POINT, that is understood for
Potgis. I though I will not be able to use COPY when inserting data
with functions.

I think you'd get better results COPYing into a (temporary) table (say
it's called 'fish') and then inserting your data into your table using
INSERT (...) SELECT ... FROM fish; with the necessary conversions.

I think it'll be still faster than a couple million seperate INSERTs,
and it gives you the opportunity to fix mistakes if you issue the above
command inside a transaction.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

#13Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Leif B. Kristensen (#5)
Re: psql is slow and it does not take much resources

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?

I believe transactions are limited to 4B commands, so the answer would
be 4B rows.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Javier de la Torre (#6)
Re: psql is slow and it does not take much resources

On Wed, May 03, 2006 at 04:43:15PM +0200, Javier de la Torre wrote:

Yes,

Thanks. I am doing this now...

Is definetly faster, but I will also discover now if there is a limit
in a transaction side... I am going to try to insert into one single
transaction 60 million records in a table.

In any case I still don't understand how why PostgreSQL was not taking
resources before without the transaction. If it has to create a
transaction per insert I understand it will have to do more things,
but why is not taking all resources from the machine? I mean, why is
it only taking 3% of them.

Because a server has more than just CPU as a resource. In this case you
were undoubtedly limited by the drives that pg_xlog is on.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#15Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Javier de la Torre (#3)
Re: psql is slow and it does not take much resources

On Wed, May 03, 2006 at 04:11:36PM +0200, Javier de la Torre wrote:

It is inserts.

I create the inserts myself with a Python programmed I hace created to
migrate MySQL databases to PostgreSQL (by th way if someone wants
it...)

Have you looked at http://pgfoundry.org/projects/my2postgres/ ?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#16Leif B. Kristensen
leif@solumslekt.org
In reply to: Jim Nasby (#13)
Re: psql is slow and it does not take much resources

On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:

However, I'm wondering if there's a practical limit to how many rows
you can insert within one transaction?

I believe transactions are limited to 4B commands, so the answer would
be 4B rows.

That is definitely not the case. I routinely do around 36000 inserts
wrapped up in one transaction.

I know that there is one hard-wired limit due to the OID wrap-around
problem, at 2^31 commands in one transaction. But the practical limit
due to hardware resources is probably much lower.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE

#17Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Leif B. Kristensen (#16)
Re: psql is slow and it does not take much resources

On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:

On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:

However, I'm wondering if there's a practical limit to how many rows
you can insert within one transaction?

I believe transactions are limited to 4B commands, so the answer would
be 4B rows.

That is definitely not the case. I routinely do around 36000 inserts
wrapped up in one transaction.

Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.
And as Tom mentioned, if you have foreign keys or triggers each insert
will burn through multiple CIDs.

I know that there is one hard-wired limit due to the OID wrap-around
problem, at 2^31 commands in one transaction. But the practical limit
due to hardware resources is probably much lower.

This has nothing to do with OIDs, and in fact I don't believe there's
any intrinsic reason why you couldn't insert more than 2B records in a
table with OIDs so long as you don't have a unique index defined on it.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#18Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Jim Nasby (#17)
Re: psql is slow and it does not take much resources

On Thu, 2006-05-04 at 16:06, Jim C. Nasby wrote:

On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:

I know that there is one hard-wired limit due to the OID wrap-around
problem, at 2^31 commands in one transaction. But the practical limit
due to hardware resources is probably much lower.

This has nothing to do with OIDs, and in fact I don't believe there's
any intrinsic reason why you couldn't insert more than 2B records in a
table with OIDs so long as you don't have a unique index defined on it.

But there is a truth there. Most people are far more likely to be
limited by their machine's hardware than by the database at this point.

Of course, SOMEONE will find a way to hit the limit.

#19Leif B. Kristensen
leif@solumslekt.org
In reply to: Jim Nasby (#17)
Re: psql is slow and it does not take much resources

On Thursday 04 May 2006 23:06, Jim C. Nasby wrote:

On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:

On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:

I believe transactions are limited to 4B commands, so the answer
would be 4B rows.

That is definitely not the case. I routinely do around 36000 inserts
wrapped up in one transaction.

Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.

I've wiped my glasses now :-)

I'm not used to reading B for Billion, and I thought I saw an 8. (Here
in Scandinavia, we adhere to the French convention of 1000 millions = 1
milliard. 1000 milliards = 1 billion.) But it looks like we do agree on
the order of magnitude after all.

Another interpretation leads to reading 4B as 4 bytes, and given that a
byte equals 8 bits, 4B would be the width of a 32-bit integer.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE