inserting, index and no index - speed
I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.
Thanks
Daniel Akerud.
zilch@home.se writes:
I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.
Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)
Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.
regards, tom lane
The test script that set up the tables is the following:
---
/* Cleanup */
DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;
DROP INDEX name_index;
DROP TABLE index_with;
DROP TABLE index_without;
/* Create a table with an index */
CREATE TABLE index_with (
id SERIAL,
name TEXT
);
CREATE INDEX name_index ON index_with(name);
/* Create a table without an index */
CREATE TABLE index_without (
id SERIAL,
name TEXT
);
---
This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables, and
meassures how long it takes.
A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).
Do I do anything wrong?
The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.
Daniel Akerud
Show quoted text
I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Yes, actually...
forgot to say that... VACUUM & VACUUM ANALYZE before each test run...
Thanks
Daniel Akerud
Did you VACUUM ANALYZE as well, after you created the tables/indexes?
----- Original Message -----
From: <zilch@home.se>
To: <pgsql-general@postgresql.org>
Sent: Sunday, June 10, 2001 2:15 PM
Subject: Re: [GENERAL] inserting, index and no index - speedThe test script that set up the tables is the following:
---
/* Cleanup */
DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;DROP INDEX name_index;
DROP TABLE index_with;
DROP TABLE index_without;/* Create a table with an index */
CREATE TABLE index_with (
id SERIAL,
name TEXT);
CREATE INDEX name_index ON index_with(name);
/* Create a table without an index */
CREATE TABLE index_without (
id SERIAL,
name TEXT);
---
This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables,and
meassures how long it takes.
A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).Do I do anything wrong?
The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.Daniel Akerud
I just noticed that inserting 10000 tuples in an indexed table took
exactly
the same amount of time as inserting 10000 tuples in a non-indexed
table
(194 seconds). Why is this? The difference in MySQL is about 50%
longer in
an indexed table.
Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
---
Daniel �kerud, zilch@home.se
Import Notes
Reply to msg id not found: 00bd01c0f1d8$04eee950$1251000a@Mitch
I just rerun the application to confirm that it was really like that. So,
using the test-environment previously described i got the following output:
Database vacuumed
pg: Trying 1000 inserts with indexing on...
Time taken: 24 seconds
pg: Trying 1000 inserts with indexing off...
Time taken: 22 seconds
Database vacuumed
pg: Trying 10000 inserts with indexing on...
Time taken: 220 seconds
pg: Trying 10000 inserts with indexing off...
Time taken: 220 seconds
Daniel Akerud
Yes, actually...
forgot to say that... VACUUM & VACUUM ANALYZE before each test run...Thanks
Daniel Akerud
Did you VACUUM ANALYZE as well, after you created the tables/indexes?
----- Original Message -----
From: <zilch@home.se>
To: <pgsql-general@postgresql.org>
Sent: Sunday, June 10, 2001 2:15 PM
Subject: Re: [GENERAL] inserting, index and no index - speedThe test script that set up the tables is the following:
---
/* Cleanup */
DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;DROP INDEX name_index;
DROP TABLE index_with;
DROP TABLE index_without;/* Create a table with an index */
CREATE TABLE index_with (
id SERIAL,
name TEXT);
CREATE INDEX name_index ON index_with(name);
/* Create a table without an index */
CREATE TABLE index_without (
id SERIAL,
name TEXT);
---
This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables,and
meassures how long it takes.
A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).Do I do anything wrong?
The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.Daniel Akerud
I just noticed that inserting 10000 tuples in an indexed table took
exactly
the same amount of time as inserting 10000 tuples in a non-indexed
table
(194 seconds). Why is this? The difference in MySQL is about 50%
longer in
an indexed table.
Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?---
Daniel �kerud, zilch@home.se---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
---
Daniel �kerud, zilch@home.se
zilch@home.se writes:
then I have this little C++ program that inserts N rows into the tables,
meassures how long it takes.
Which is done how, exactly?
regards, tom lane
zilch@home.se writes:
then I have this little C++ program that inserts N rows into the tables,
meassures how long it takes.Which is done how, exactly?
regards, tom lane
---
// delete from index_with
// vacuum
// vacuum analyze
timeAnalyzer.startClock();
for ( int i = 0; i < nrIterations; i++) {
sprintf(sqlStatement, "INSERT INTO index_with (name) VALUES ('%s')", data[i]);
try {
db->runCommand(sqlStatement);
}
catch(Exception e) {
cout << "Exception caught: " << e.getException() << endl;
}
}
timeAnalyzer.stopClock();
// output
---
data[i] is data created with 'pwgen 8 100000 > data'
Then anotherone but index_without instead.
Could it be that the 10000 rows in the table makes it so much slower so that
the next 10000 in the same database but different table get so much slower?
Daniel Akerud
Could it be that the 10000 rows in the table makes it so much slower so that
the next 10000 in the same database but different table get so much slower?Daniel Akerud
It wasn't.
Now the database is cleared and VACUUMed and VACUUMED ANALYZEd before both
tests (table WITH and WITHOUT and index). Still same results. 195/194 seconds.
Daniel Akerud
zilch@home.se writes:
CREATE TABLE index_with (
id SERIAL,
name TEXT
);
CREATE INDEX name_index ON index_with(name);
CREATE TABLE index_without (
id SERIAL,
name TEXT
);
Actually, what you are comparing here is a table with two indexes to a
table with one index. Moreover, both of them incur a sequence nextval()
operation for each insert. So it's not two files updated versus one,
it's four versus three.
Also, given the small size of these tables, it's likely that most of the
updates occur in in-memory disk buffers. If you are running with fsync
on, nearly all the actual I/O per insert will be the write and fsync of
the WAL log. The time required for that is not going to be very
sensitive to the amount of data written, as long as it's much less than
one disk block per transaction, which will be true in both these cases.
You end up writing one block to the log per transaction anyway.
You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them). It'd also be educational to try it
with fsync disabled, or with id declared as plain int not serial.
regards, tom lane
Thanks Tom,
really appreciate it!
Daniel Akerud
Show quoted text
zilch@home.se writes:
CREATE TABLE index_with (
id SERIAL,
name TEXT
);
CREATE INDEX name_index ON index_with(name);CREATE TABLE index_without (
id SERIAL,
name TEXT
);Actually, what you are comparing here is a table with two indexes to a
table with one index. Moreover, both of them incur a sequence nextval()
operation for each insert. So it's not two files updated versus one,
it's four versus three.Also, given the small size of these tables, it's likely that most of the
updates occur in in-memory disk buffers. If you are running with fsync
on, nearly all the actual I/O per insert will be the write and fsync of
the WAL log. The time required for that is not going to be very
sensitive to the amount of data written, as long as it's much less than
one disk block per transaction, which will be true in both these cases.
You end up writing one block to the log per transaction anyway.You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them). It'd also be educational to try it
with fsync disabled, or with id declared as plain int not serial.regards, tom lane
You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them).
A HUGE difference (also completely took away the ID field (serial) having
only name):
Database vacuumed
pg: Trying 25000 inserts on index_with...
Time taken: 12 seconds
Database vacuumed
pg: Trying 25000 inserts on index_without...
Time taken: 12 seconds <--- MIGHT BE BACUASE OF FSYNC!? (fsyncing also for the previous run)
Database vacuumed
pg: Trying 30000 inserts on index_with...
Time taken: 15 seconds
Database vacuumed
pg: Trying 30000 inserts on index_without...
Time taken: 12 seconds
Database vacuumed
pg: Trying 35000 inserts on index_with...
Time taken: 21 seconds
Database vacuumed
pg: Trying 35000 inserts on index_without...
Time taken: 14 seconds
I can't believe what a difference that made. How can it make it faster by
putting it in a transaction? I thought that would make it slower. Like only
a 100th of the time.
Daniel Akerud
zilch@home.se writes:
I can't believe what a difference that made. How can it make it faster by
putting it in a transaction? I thought that would make it slower. Like only
a 100th of the time.
Everything is always a transaction in Postgres. If you don't say
begin/end, then there's an implicit begin and end around each individual
query. So your first set of tests were paying transaction commit
overhead for each insert.
regards, tom lane
"TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
TL> query. So your first set of tests were paying transaction commit
TL> overhead for each insert.
This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: khera@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On 10 Jun 2001, Vivek Khera wrote:
"TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
TL> query. So your first set of tests were paying transaction commit
TL> overhead for each insert.This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.
No, that's exactly how it is supposed to work, to guarantee that you will
never get same value from two separate calls to nextval.
-alex
"AP" == Alex Pilosov <alex@pilosoft.com> writes:
TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.
AP> No, that's exactly how it is supposed to work, to guarantee that you will
AP> never get same value from two separate calls to nextval.
Even if your transaction fails? That seems to counter the definition
of a transaction that aborts; the state of the database is different
than before.
Or am I really thinking wrongly about what an aborted transaction
should leave behind?
On Sun, 10 Jun 2001, Vivek Khera wrote:
Even if your transaction fails? That seems to counter the definition
of a transaction that aborts; the state of the database is different
than before.
Yes, except for the sequences.
Consider this example, transactions A, B, C, sequence S.
in A S.nextval = 1
in B S.nextval = 2
in C S.nextval = 3
transaction B then aborts, A and C succeed. Then, in your logic, nextval
of S should be 2, but really, to keep this kind of state, you need a table
listing 'currently unused values'. That, when your sequence gets to
millions, is a colossal waste of space.
If you want "maximum id that's not currently used in my table" use
max(id), if you want "give me a non-repeating number", use sequence.
There also are implications on concurrency when you use max(id), as only
one transaction can do it without danger of repeating IDs.
-alex