Slowdown problem when writing 1.7million records
I am very new to PostgreSQL and have installed v7.03 on a Red Hat Linux
Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.
I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )
I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.
In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
I have never experienced this sort of degradation with any other database
and I have done exactly the same test (using the same hardware) on the
following databases:
DB2 v7 in total took 10hours 6mins
Oracle 8i in total took 3hours 20mins
Interbase v6 in total took 1hr 41min
MySQL v3.23 in total took 54mins
Any Help or advise would be appreciated.
Thanks
Stephen Livesey
Stephen Livesey wrote:
I am very new to PostgreSQL and have installed v7.03 on a Red Hat Linux
Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
AFAICT, no.
I have never experienced this sort of degradation with any other database
and I have done exactly the same test (using the same hardware) on the
following databases:
DB2 v7 in total took 10hours 6mins
Oracle 8i in total took 3hours 20mins
Interbase v6 in total took 1hr 41min
MySQL v3.23 in total took 54minsAny Help or advise would be appreciated.
Did you "vacuum analyse" your DB ? This seems to be essential to PG
performance, for various reasons.
Do you have a unique index on your primary key ?
HTH,
Emmanuel Charpentier
Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.
I don't use Microsoft software, nor Java, but a few general suggestions
for you:
- you may want to create the INDEX at the end (using CREATE INDEX)
in one operation
- you may want to run your requests in transactions: e.g. a transaction
which is committed every N insertion.
- in some rare case you may want to disable the fsync() of the
PostgreSQL
- you may want to use a more efficient interface, such as the
COPY command.
Example (using transactions)
#! /usr/bin/perl -wI /home/schaefer/perl-libs
# $Id$
use DBI;
use test_db;
use strict;
my $debug = 0;
my $result = 0; # success
my $reason = "of an unknown error";
$| = 1;
my $dbh = &open_database();
if (defined($dbh)) {
my $amount_entries = 4000000;
my $commit_every = 10000;
# Sometimes, large data entries are done better with a COPY.
$dbh->{AutoCommit} = 0; # Use transactions.
foreach (('sol_f', 'sol_i', 'sol_d')) {
# Using transactions should make insertion faster, since fsync()
# are probably not required. However, when changes are very big,
# it might actually make it slower or using much space, this
# is why we have this $commit_every above and below.
# was expecting a BEGIN WORK; but that
# seem to be implicit.
eval {
my $i;
my $failed_reason = "unknown db error";
print "Populating " . $_ . " ...";
for ($i = 0; ($i < $amount_entries) && ($result == 0); $i++) {
my @titles = ('id', 'ref', 'sentpos', 'wordpos');
if (!&do_query($dbh,
"INSERT INTO " . $_ . "("
. join(", ", @titles)
. ") VALUES ("
. join(", ", ('?') x @titles)
. ")",
\@titles,
undef,
[ int(rand(32768)),
'truc',
int(rand(32768)),
'temp'
],
undef,
\$failed_reason)) {
$result = 1;
$reason = "can't insert " . $i . ": " . $failed_reason;
}
else {
if (($i % $commit_every) == 0) {
if ($dbh->commit) {
print "C ";
}
else {
$result = 1;
$reason = "can't commit: " . $dbh->errstr;
}
}
}
}
if ($result == 0) {
print " POPULATED.\n";
}
else {
print " FAILED.\n";
}
}; # DB is not set to die mode, so we will catch only our bugs.
if ($@) {
print;
$result = 1;
$reason = "transaction failed: " . $@; # Not always right.
$dbh->rollback; # res. ign. (in failure mode anyway)
}
elsif ($result) {
$dbh->rollback; # res. ign. (in failure mode anyway)
}
else {
if ($dbh->commit) {
$result = 1;
$reason = "can't commit: " . $dbh->errstr;
}
print "COMMITTED.\n";
}
}
$dbh->{AutoCommit} = 1; # No transactions
if (!$dbh->disconnect) {
$result = 1;
$reason = "disconnect error: " . $dbh->errstr;
}
undef $dbh;
}
else {
# Obviously, can't use $dbh->errstr here.
$reason = "can't database connect: " . $DBI::errstr;
$result = 1;
}
if ($result) {
print $0 . ": failed " . $result . " because " . $reason . "\n";
}
else {
print "SUCCESSFUL.\n";
}
exit $result;
sub create_table {
my($dbh, $name, $val) = @_;
my $result = 0;
my $sth = $dbh->prepare("CREATE TABLE $name ($val)");
if (defined($sth)) {
my $rv = $sth->execute;
if (defined($rv)) {
if ($debug) {
print "$name: succeeded.\n";
}
$result = 1;
}
$sth->finish;
undef $sth;
}
return $result;
}
And using COPY:
#! /usr/bin/perl -wI /home/schaefer/perl-libs
# USAGE
# ./copy.pl | psql test_db
# $Id$
my $amount_entries = 4000000;
my $tell_every = 100000;
print "COPY sol_f FROM stdin;\n";
my $i;
for ($i = 1; $i <= $amount_entries; $i++) {
print int(rand(32768)) . "\t" . 'truc' . "\t" . int(rand(32768)) . "\t"
. 'temp' . "\n";
if (($i % $tell_every) == 0) {
print STDERR $i . "\n";
}
}
print ".\n";
I am very new to PostgreSQL and have installed v7.03 on a Red Hat
Linux
Server (v6.2), I am accessing the files using JDBC from a Windows
2000 PC.
I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
AFAICT, no.
Funny you should say that, but I seem to be experiencing a similar
things when doing pgdump/restore. When restoring a table with around
40M records, everythings starts nice and fast. However, by the time
it's half done, inserts take forever. It took about 16 hours to
restore all 40M records (3 GB or so + indices). Obviously, since it
was a "restore", there was no vacuuming during the operation, but it
still seemed to get unusually slow toward the end... Judging by the
increase rates in file sizes, anyway, which I accept, isn't the most
reliable way to judge this matter by...
But, I guess this is nothing comparable to your problem. I was
inserting 40M records, and it took 16 hours.
You are inserting 1.7M records, and it's taking 43 hours.
Are you using autocommit? Are you using fsync? You should probably
periodically commit the data every few thousand inserts if you don't
have autocommit enabled.
Are you running out of memory? What amount of shared memory do you
have enabled? How much is Postgres allowed to use? I know this is
fairly obvious stuff, but it is, as such, quite easy to overlook...
I have never experienced this sort of degradation with any other
database
and I have done exactly the same test (using the same hardware) on
the
following databases:
DB2 v7 in total took 10hours 6mins
Oracle 8i in total took 3hours 20mins
Interbase v6 in total took 1hr 41min
MySQL v3.23 in total took 54minsAny Help or advise would be appreciated.
The only thing that comes to mind is that if you're doing a bulk
insert, you should probably drop all indices that aren't unique or for
primary keys, and re-create them once your insert all your data...
Regards.
Gordan
43 hours? Ouch, that is quite a long time! I'm no expert by any means, but
here are a few tips that I've picked up on this list that might help out:
1. The COPY command is blazing fast for importing, if you are certain your
input is clean, this is the way to go. Read more about that here:
http://www.postgresql.org/users-lounge/docs/7.0/postgres/sql-copy.htm
2. If you have any indexes, drop them, then do your import, then readd them
after that is complete. Indexes slow inserts down quite a bit.
3. If you are using INSERTs (I can't see a COPY taking more than a few
minutes), make sure that you are wrapping them in an explicit transaction.
Otherwise, each INSERT becomes its own transaction with all that overhead.
4. If you *are* using transactions, break the transactions up into chunks.
Trying to maintain a single transaction over 1.7 million INSERTs will slow
things down. Personally I'd probably go with about 500 at a time.
I'm sure someone else will have another suggestion or three...
Greg
----- Original Message -----
From: "Stephen Livesey" <ste@exact3ex.co.uk>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, February 27, 2001 4:44 AM
Subject: Slowdown problem when writing 1.7million records
Show quoted text
I am very new to PostgreSQL and have installed v7.03 on a Red Hat Linux
Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
I have never experienced this sort of degradation with any other database
and I have done exactly the same test (using the same hardware) on the
following databases:
DB2 v7 in total took 10hours 6mins
Oracle 8i in total took 3hours 20mins
Interbase v6 in total took 1hr 41min
MySQL v3.23 in total took 54minsAny Help or advise would be appreciated.
Thanks
Stephen Livesey
"Stephen Livesey" <ste@exact3ex.co.uk> writes:
I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )
I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.
In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
No, it's not. Do you have any triggers or rules on this table that
you haven't shown us? How about other tables referencing this one
as foreign keys? (Probably not, if you're running an identical test
on MySQL, but I just want to be sure that I'm not missing something.)
How exactly are you writing the records?
I have a suspicion that the slowdown must be on the client side (perhaps
some inefficiency in the JDBC code?) but that's only a guess at this
point.
regards, tom lane
"Gordan Bobic" <gordan@freeuk.com> writes:
Are you using autocommit? Are you using fsync? You should probably
periodically commit the data every few thousand inserts if you don't
have autocommit enabled.
[ various other speedup suggestions from other people ]
The thing that interests me here is the apparent slowdown as more data
is inserted. I don't see a good reason for that, and none of the
suggested remedies seem likely to eliminate it if we don't know what's
causing it.
INSERT per se should be an essentially constant-time operation, since
it's just slapping another record at the end of the table. Unless
you're using advanced features like triggers/rules/foreign keys,
it seems like any progressive slowdown would have to be blamed on
updates of indexes. But that's not normally a performance problem
except in pathological cases (zillions of equal keys for example ---
but Stephen's only index is a PRIMARY KEY, therefore UNIQUE, therefore
no equal keys). So I'm confused, and think this deserves more careful
examination as to *why* the slowdown, rather than just looking for
a quick&dirty workaround.
regards, tom lane
Tom Lane wrote:
"Stephen Livesey" <ste@exact3ex.co.uk> writes:
I have created a small file as follows:
CREATE TABLE expafh (
postcode CHAR(8) NOT NULL,
postcode_record_no INT,
street_name CHAR(30),
town CHAR(31),
PRIMARY KEY(postcode) )I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.In total, it took 43 hours to write 1.7million records.
Is this sort of degradation normal using a PostgreSQL database?
No, it's not. Do you have any triggers or rules on this table that
you haven't shown us? How about other tables referencing this one
as foreign keys? (Probably not, if you're running an identical test
on MySQL, but I just want to be sure that I'm not missing something.)How exactly are you writing the records?
I have a suspicion that the slowdown must be on the client side (perhaps
some inefficiency in the JDBC code?) but that's only a guess at this
point.regards, tom lane
Are the inserts all part of one enormous transaction? If so, would that
mean that the cumulative changes would be put 'somewhere' (technical
term) temporarily before the commit, and that 'somewhere' being really full
would result in lots of disk caching?
MS SQL Server has a utility called bcp which has a setting for the number
of records to insert at a time for just this reason. If you tried to bulk
copy 1.7 million records into a table, even without triggers, rules or
constraints, your WinNT server would puke on your feet unless you put a
reasonable setting (like 100,000) in the command to tell it to commit after
each 100,000 records.
Ian
Ian
At 02:25 PM 27-02-2001 -0500, Tom Lane wrote:
Is this sort of degradation normal using a PostgreSQL database?
No, it's not. Do you have any triggers or rules on this table that
I find slowdowns with inserts with rollbacks and updates with commits ( no
triggers or rules) :(.
But no noticeable slowdown for inserts with commits (even with a unique);
Here are some tests I did:
select version();
PostgreSQL 7.1beta4 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
drop table test;
create table test (a text unique);
--- dbtest7
cat dbtest7
#!/usr/bin/perl -wT
use DBI;
my $c=0;
my $dbh = DBI->connect('DBI:Pg(AutoCommit =>
0):dbname=lylyeoh','lylyeoh','hoho') or die("Error connecting to
database!",$DBI::errstr);
my $SQL=<<"EOT";
insert into test (a) values (?)
EOT
my $sth = $dbh->prepare($SQL);
while ($c++<2000) {
$sth->execute($c) or die("Error executing query!",
$DBI::errstr);
}
$dbh->rollback;
$dbh->disconnect;
--- results:
time ./dbtest7
0.24user 0.08system 0:01.60elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (383major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.30user 0.06system 0:01.56elapsed 22%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.23user 0.08system 0:01.57elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.26user 0.09system 0:01.57elapsed 22%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.25user 0.10system 0:01.59elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.27user 0.05system 0:01.60elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.31user 0.07system 0:01.62elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.25user 0.09system 0:01.63elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.24user 0.16system 0:01.62elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.23user 0.05system 0:01.63elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.26user 0.09system 0:01.64elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.27user 0.06system 0:01.67elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.31user 0.06system 0:01.68elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.29user 0.07system 0:01.69elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.27user 0.06system 0:01.69elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.23user 0.17system 0:01.70elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.25user 0.08system 0:01.72elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.31user 0.09system 0:01.74elapsed 22%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.32user 0.08system 0:01.76elapsed 22%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.28user 0.07system 0:01.75elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.21user 0.07system 0:01.75elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.25user 0.10system 0:01.78elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.31user 0.04system 0:01.76elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.20user 0.06system 0:01.79elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.29user 0.10system 0:01.82elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest7
0.30user 0.14system 0:01.80elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
Eventually it went to 2 seconds, and probably will continue increasing.
It is even more dramatic for updates:
drop table test;
create table test ( t text,id int);
insert into test (t,id) values ('',1);
--- dbtest3
#!/usr/bin/perl -wT
use DBI;
my $c=0;
my $dbh = DBI->connect('DBI:Pg(AutoCommit =>
0):dbname=lylyeoh','lylyeoh','haha') or die("Error connecting to
database!",$DBI::errstr);
my $SQL=<<"EOT";
update test set t=? where id=1
EOT
my $sth = $dbh->prepare($SQL);
my $rv='';
while ($c++<1000) {
$rv = $sth->execute("a$c") or die("Error executing query!",
$DBI::errstr);
}
$dbh->commit;
$dbh->disconnect;
time ./dbtest3
0.20user 0.09system 0:00.99elapsed 29%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest3
0.13user 0.08system 0:01.30elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest3
0.20user 0.05system 0:01.62elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest3
0.25user 0.02system 0:01.98elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest3
0.22user 0.06system 0:02.47elapsed 11%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
[lylyeoh@nimbus perl]$ time ./dbtest3
0.23user 0.02system 0:02.88elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (382major+231minor)pagefaults 0swaps
Any way to stop this degradation short of a "vacuum test"?
The "update with commit" slowdown affects my hits per second for my webapp.
Cheerio,
Link.
No, it's not. Do you have any triggers or rules on this table that
you haven't shown us? How about other tables referencing this one
as foreign keys? (Probably not, if you're running an identical test
on MySQL, but I just want to be sure that I'm not missing something.)
I have no triggers or rules. I have only created 1 table and their are no
foreign keys.
How exactly are you writing the records?
First I read the data from a 'Powerflex' file and hold this in a record set.
pfxstmt = pfxconn.createStatement();
pfxrs = pfxstmt.executeQuery("SELECT * from expafh");
I then perform a loop which writes the data to my 'Postgresql' file as
follows:
stmt = conn.createStatement();
while (pfxrs.next()) {
cmd = "INSERT INTO expafh VALUES ";
cmd = cmd +
"('"+pfxrs.getString(2)+"',"+pfxrs.getString(3)+",'"+pfxrs.getString(4)+"','
"+pfxrs.getString(5)+"')";
stmt.executeUpdate(cmd);
}
I have a suspicion that the slowdown must be on the client side (perhaps
some inefficiency in the JDBC code?) but that's only a guess at this
point.
I have used identical code for all of my testing, the only changes being
which drivers I use to access the data.
Thanks
Stephen Livesey
Legal Disclaimer:
Internet communications are not secure and therefore Exact Abacus does
not accept legal responsibility for the contents of this message. Any views
or opinions presented are solely those of the author and do not necessarily
represent those of Exact Abacus unless otherwise specifically stated.
Further question --- is there any particular pattern to the order in
which you are inserting the records? For example, are they in order
by postcode, or approximately so?
regards, tom lane
"Stephen Livesey" <ste@exact3ex.co.uk> writes:
Further question --- is there any particular pattern to the order in
which you are inserting the records? For example, are they in order
by postcode, or approximately so?Yes they would be inserted in postcode order.
Ah. What must be happening is that the index on postcode is getting
badly misbalanced --- instead of a roughly symmetrical b-tree, all the
branches link to the right, causing index insertions and lookups to scan
the whole index instead of only an O(log N) portion of it. You'd get
better results if you inserted the data in random order, or dropped the
index while inserting the data and then recreated it after the bulk
loading is done.
I am not sure how difficult this behavior might be to change, but in any
case I'm not going to risk twiddling the btree code at this late stage
of the 7.1 release cycle. Bruce, would you add a TODO item?
* Be smarter about insertion of already-ordered data into btree index
regards, tom lane
Import Notes
Reply to msg id not found: GDENKIKOJLMLLEIOOCCIMELNCDAA.ste@exact3ex.co.ukReference msg id not found: GDENKIKOJLMLLEIOOCCIMELNCDAA.ste@exact3ex.co.uk | Resolved by subject fallback
"Stephen Livesey" <ste@exact3ex.co.uk> writes:
Further question --- is there any particular pattern to the order in
which you are inserting the records? For example, are they in order
by postcode, or approximately so?Yes they would be inserted in postcode order.
Ah. What must be happening is that the index on postcode is getting
badly misbalanced --- instead of a roughly symmetrical b-tree, all the
branches link to the right, causing index insertions and lookups to scan
the whole index instead of only an O(log N) portion of it. You'd get
better results if you inserted the data in random order, or dropped the
index while inserting the data and then recreated it after the bulk
loading is done.I am not sure how difficult this behavior might be to change, but in any
case I'm not going to risk twiddling the btree code at this late stage
of the 7.1 release cycle. Bruce, would you add a TODO item?* Be smarter about insertion of already-ordered data into btree index
Added. I thought our btree code was self-balancing.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
* Be smarter about insertion of already-ordered data into btree index
Added. I thought our btree code was self-balancing.
I did too, but evidently it's got a problem in this scenario ...
regards, tom lane
On Wed, 28 Feb 2001, Bruce Momjian wrote:
* Be smarter about insertion of already-ordered data into btree index
Added. I thought our btree code was self-balancing.
My understanding of most "self-balancing" algorithms is that the would
have a significantly greater overhead in adding sequential
data to any self balancing algorithm. Likely greater than even doing some
randomization of the data.
I just joined this list, so pardon if this has been suggested.
Have you tried 'COPY expafh FROM stdin', rather than inserting each record?
I'm managing a 2.5 million record import, creating a btree index on two
columns, and then vacuuming the db in 36 minutes (on an Ultra 5 - similar to
a AMD K6-2 500). The data is being read from a 600Mb file.
I'm also using 7.1beta5 with the -F flag on the backend (prevents db server
from flushing after each transaction - can be dangerous, but the server is
faster). I've attached a Perl script I use - the key being the putline
command.
Note that when using COPY, default values, sequences, etc. are not used. If
you have a SERIAL field, you have to put in the incrementing values
yourself, and then use 'setval' to get thing correct again.
I apologize for the Perl script - it's not commented. If you have trouble
understanding it, let me know and I'll spruce it up.
Robert Creager
Senior Software Engineer
Client Server Library
303.673.2365 V
303.661.5379 F
888.912.4458 P
StorageTek
INFORMATION made POWERFUL
Show quoted text
-----Original Message-----
From: Stephen Livesey [mailto:ste@exact3ex.co.uk]
Sent: Wednesday, February 28, 2001 2:20 AM
To: Tom Lane
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] Slowdown problem when writing
1.7million recordsNo, it's not. Do you have any triggers or rules on this table that
you haven't shown us? How about other tables referencing this one
as foreign keys? (Probably not, if you're running an identical test
on MySQL, but I just want to be sure that I'm not missingsomething.)
I have no triggers or rules. I have only created 1 table and
their are no
foreign keys.How exactly are you writing the records?
First I read the data from a 'Powerflex' file and hold this
in a record set.
pfxstmt = pfxconn.createStatement();
pfxrs = pfxstmt.executeQuery("SELECT * from expafh");I then perform a loop which writes the data to my 'Postgresql' file as
follows:
stmt = conn.createStatement();
while (pfxrs.next()) {
cmd = "INSERT INTO expafh VALUES ";
cmd = cmd +
"('"+pfxrs.getString(2)+"',"+pfxrs.getString(3)+",'"+pfxrs.get
String(4)+"','
"+pfxrs.getString(5)+"')";
stmt.executeUpdate(cmd);
}I have a suspicion that the slowdown must be on the client
side (perhaps
some inefficiency in the JDBC code?) but that's only a guess at this
point.I have used identical code for all of my testing, the only
changes being
which drivers I use to access the data.Thanks
Stephen LiveseyLegal Disclaimer:
Internet communications are not secure and therefore Exact
Abacus does
not accept legal responsibility for the contents of this
message. Any views
or opinions presented are solely those of the author and do
not necessarily
represent those of Exact Abacus unless otherwise specifically stated.---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)
Attachments:
Import Notes
Resolved by subject fallback
This is a followup to a problem report Stephen Livesey made back in
February, to the effect that successive insertions got slower and
slower. At the time we speculated that btree indexes had a problem
with becoming out-of-balance when fed steadily increasing data values.
I have now tried to reproduce that effect --- and failed, in both
current sources and 7.0.2.
I did
create table foo (f1 serial primary key);
and then ran a process that just did
insert into foo default values;
over and over (each insert as a separate transaction). This will result
in inserting a continually increasing sequence of key values into the
pkey index.
I ran this out to about 3.4million records (twice the number of records
Stephen used) on both 7.0.2 and current sources. I do not see any real
slowdown in the insertion rate, and certainly not the drastic slowdown
that Stephen observed: he said
I am now writing 1.7million records to this file.
The first 100,000 records took 15mins.
The next 100,000 records took 30mins
The last 100,000 records took 4hours.In total, it took 43 hours to write 1.7million records.
It took me about 140 minutes to write 3.4million records, on a not
particularly fast machine; the insertion rate held pretty steady at
around 400 records/sec (fsync off).
So I no longer think that the problem was with the btree index.
Other possibilities are:
* If Stephen had fsync on, and his kernel was using a particularly
stupid fsync algorithm, the time might all have been going into
kernel searches of its buffer lists --- the files being fsync'd
would've been growing and growing, and some fsync routines take
time proportional to the number of buffered blocks.
* If Stephen had any relevant foreign-key checks (though he said not)
and was inserting all the records in one transaction, then the known
O(N^2) behavior in 7.0.*'s handling of deferred triggers could've
been the culprit. If so, this is fixed in 7.1.
* The problem might have been on the client side. I'm not sure
whether JDBC on Windows might suffer from memory leaks or anything
like that.
Anyway: Stephen, if you haven't lost interest, we need to take another
look at your problem and figure out where the issue really is.
Bruce, please remove the TODO item
* Be smarter about insertion of already-ordered data into btree index
It seems to have been a false alarm.
regards, tom lane
After over 3 months we are finally preparing for the release of our new
PostgreSQL book.
The book PostgreSQL: The elephant never forgets is going to be the most
comprehensive book available for PostgreSQL. The title will retail for
44.95 but you can pre-order it now for 34.95.
The book is currently over 450 pages and we expect it to hit 600. We
expect the book to ship within 6 weeks. You can read more at
http://www.opendocspublishing.com.
We invite you to comment on these chapters to help us make a better text.
Please remember that these chapters are not finished and we are now
actively seeking feedback.
J
Hi,
In the Advanced Features section.
In the section about Inheritance, there is the following command:
INSERT INTO dist_author(a_id, lastname, email, award)
VALUES (110, 'Seuss','seuss@childprodigy.com',
'National Book Awards);
There is a quote (') missing, it should read:
INSERT INTO dist_author(a_id, lastname, email, award)
VALUES (110, 'Seuss','seuss@childprodigy.com',
'National Book Awards');
Call me picky, but I always seem to fall for either missing quote, or using
double quotes instead of single quotes.
Colin
-----Original Message-----
From: pgsql-general@commandprompt.com
[mailto:pgsql-general@commandprompt.com]
Sent: Thursday, 31 May 2001 21:13
To: pgsql-general@postgresql.org
Subject: [GENERAL] PostgreSQL: The elephant never forgets UPDATEAfter over 3 months we are finally preparing for the release
of our new
PostgreSQL book.The book PostgreSQL: The elephant never forgets is going to
be the most
comprehensive book available for PostgreSQL. The title will retail for
44.95 but you can pre-order it now for 34.95.The book is currently over 450 pages and we expect it to hit 600. We
expect the book to ship within 6 weeks. You can read more at
http://www.opendocspublishing.com.We invite you to comment on these chapters to help us make a
better text.
Please remember that these chapters are not finished and we are now
actively seeking feedback.J
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. postmaster@portasystems.co.uk
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
Import Notes
Resolved by subject fallback