inserting, index and no index - speed

Started by Daniel Åkerudalmost 25 years ago16 messagesgeneral
Jump to latest
#1Daniel Åkerud
zilch@home.se

I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.

Thanks

Daniel Akerud.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Åkerud (#1)
Re: inserting, index and no index - speed

zilch@home.se writes:

I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.

Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)

Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.

regards, tom lane

#3Daniel Åkerud
zilch@home.se
In reply to: Tom Lane (#2)
Re: inserting, index and no index - speed

The test script that set up the tables is the following:

---

/* Cleanup */

DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;

DROP INDEX name_index;

DROP TABLE index_with;
DROP TABLE index_without;

/* Create a table with an index */

CREATE TABLE index_with (

id SERIAL,
name TEXT

);

CREATE INDEX name_index ON index_with(name);

/* Create a table without an index */

CREATE TABLE index_without (

id SERIAL,
name TEXT

);

---

This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables, and
meassures how long it takes.

A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).

Do I do anything wrong?

The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.

Daniel Akerud

Show quoted text

I just noticed that inserting 10000 tuples in an indexed table took exactly
the same amount of time as inserting 10000 tuples in a non-indexed table
(194 seconds). Why is this? The difference in MySQL is about 50% longer in
an indexed table.

Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)

Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#4Daniel Åkerud
zilch@home.se
In reply to: Daniel Åkerud (#1)
Re: inserting, index and no index - speed

Yes, actually...
forgot to say that... VACUUM & VACUUM ANALYZE before each test run...

Thanks

Daniel Akerud

Did you VACUUM ANALYZE as well, after you created the tables/indexes?

----- Original Message -----
From: <zilch@home.se>
To: <pgsql-general@postgresql.org>
Sent: Sunday, June 10, 2001 2:15 PM
Subject: Re: [GENERAL] inserting, index and no index - speed

The test script that set up the tables is the following:

---

/* Cleanup */

DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;

DROP INDEX name_index;

DROP TABLE index_with;
DROP TABLE index_without;

/* Create a table with an index */

CREATE TABLE index_with (

id SERIAL,
name TEXT

);

CREATE INDEX name_index ON index_with(name);

/* Create a table without an index */

CREATE TABLE index_without (

id SERIAL,
name TEXT

);

---

This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables,

and

meassures how long it takes.

A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).

Do I do anything wrong?

The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.

Daniel Akerud

I just noticed that inserting 10000 tuples in an indexed table took

exactly

the same amount of time as inserting 10000 tuples in a non-indexed

table

(194 seconds). Why is this? The difference in MySQL is about 50%

longer in

an indexed table.

Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)

Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

---
Daniel �kerud, zilch@home.se

#5Daniel Åkerud
zilch@home.se
In reply to: Daniel Åkerud (#4)
Re: inserting, index and no index - speed

I just rerun the application to confirm that it was really like that. So,
using the test-environment previously described i got the following output:

Database vacuumed
pg: Trying 1000 inserts with indexing on...
Time taken: 24 seconds

pg: Trying 1000 inserts with indexing off...
Time taken: 22 seconds

Database vacuumed
pg: Trying 10000 inserts with indexing on...
Time taken: 220 seconds

pg: Trying 10000 inserts with indexing off...
Time taken: 220 seconds

Daniel Akerud

Yes, actually...
forgot to say that... VACUUM & VACUUM ANALYZE before each test run...

Thanks

Daniel Akerud

Did you VACUUM ANALYZE as well, after you created the tables/indexes?

----- Original Message -----
From: <zilch@home.se>
To: <pgsql-general@postgresql.org>
Sent: Sunday, June 10, 2001 2:15 PM
Subject: Re: [GENERAL] inserting, index and no index - speed

The test script that set up the tables is the following:

---

/* Cleanup */

DROP SEQUENCE index_with_id_seq;
DROP SEQUENCE index_without_id_seq;

DROP INDEX name_index;

DROP TABLE index_with;
DROP TABLE index_without;

/* Create a table with an index */

CREATE TABLE index_with (

id SERIAL,
name TEXT

);

CREATE INDEX name_index ON index_with(name);

/* Create a table without an index */

CREATE TABLE index_without (

id SERIAL,
name TEXT

);

---

This is run just before it is tested,
then I have this little C++ program that inserts N rows into the tables,

and

meassures how long it takes.

A DELETE * FROM table (both tables) followed by a VACCUUM is also run
before each test run (which consists of regular INSERT statements).

Do I do anything wrong?

The postmaster (7.1.2) is run with then current Debian testing/unstable
standard options.

Daniel Akerud

I just noticed that inserting 10000 tuples in an indexed table took

exactly

the same amount of time as inserting 10000 tuples in a non-indexed

table

(194 seconds). Why is this? The difference in MySQL is about 50%

longer in

an indexed table.

Surprises me too. Which PG version, and what are the test conditions
exactly? (Table and index declarations; is table empty initially;
how is backend being driven, and what commands are issued exactly?
How many shared buffers, platform, etc)

Under PG 7.1, it's possible that your test caused no actual I/O except
to the WAL log ... but I'd still think that the volume of WAL I/O
would be greater when writing an index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

---
Daniel �kerud, zilch@home.se

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

---
Daniel �kerud, zilch@home.se

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Åkerud (#3)
Re: inserting, index and no index - speed

zilch@home.se writes:

then I have this little C++ program that inserts N rows into the tables,
meassures how long it takes.

Which is done how, exactly?

regards, tom lane

#7Daniel Åkerud
zilch@home.se
In reply to: Tom Lane (#6)
Re: inserting, index and no index - speed

zilch@home.se writes:

then I have this little C++ program that inserts N rows into the tables,
meassures how long it takes.

Which is done how, exactly?

regards, tom lane

---

// delete from index_with

// vacuum

// vacuum analyze

timeAnalyzer.startClock();

for ( int i = 0; i < nrIterations; i++) {

sprintf(sqlStatement, "INSERT INTO index_with (name) VALUES ('%s')", data[i]);

try {

db->runCommand(sqlStatement);

}
catch(Exception e) {

cout << "Exception caught: " << e.getException() << endl;

}

}

timeAnalyzer.stopClock();

// output

---

data[i] is data created with 'pwgen 8 100000 > data'

Then anotherone but index_without instead.

Could it be that the 10000 rows in the table makes it so much slower so that
the next 10000 in the same database but different table get so much slower?

Daniel Akerud

#8Daniel Åkerud
zilch@home.se
In reply to: Daniel Åkerud (#7)
Re: inserting, index and no index - speed

Could it be that the 10000 rows in the table makes it so much slower so that
the next 10000 in the same database but different table get so much slower?

Daniel Akerud

It wasn't.
Now the database is cleared and VACUUMed and VACUUMED ANALYZEd before both
tests (table WITH and WITHOUT and index). Still same results. 195/194 seconds.

Daniel Akerud

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Åkerud (#3)
Re: inserting, index and no index - speed

zilch@home.se writes:

CREATE TABLE index_with (
id SERIAL,
name TEXT
);
CREATE INDEX name_index ON index_with(name);

CREATE TABLE index_without (
id SERIAL,
name TEXT
);

Actually, what you are comparing here is a table with two indexes to a
table with one index. Moreover, both of them incur a sequence nextval()
operation for each insert. So it's not two files updated versus one,
it's four versus three.

Also, given the small size of these tables, it's likely that most of the
updates occur in in-memory disk buffers. If you are running with fsync
on, nearly all the actual I/O per insert will be the write and fsync of
the WAL log. The time required for that is not going to be very
sensitive to the amount of data written, as long as it's much less than
one disk block per transaction, which will be true in both these cases.
You end up writing one block to the log per transaction anyway.

You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them). It'd also be educational to try it
with fsync disabled, or with id declared as plain int not serial.

regards, tom lane

#10Daniel Åkerud
zilch@home.se
In reply to: Tom Lane (#9)
Re: inserting, index and no index - speed

Thanks Tom,
really appreciate it!

Daniel Akerud

Show quoted text

zilch@home.se writes:

CREATE TABLE index_with (
id SERIAL,
name TEXT
);
CREATE INDEX name_index ON index_with(name);

CREATE TABLE index_without (
id SERIAL,
name TEXT
);

Actually, what you are comparing here is a table with two indexes to a
table with one index. Moreover, both of them incur a sequence nextval()
operation for each insert. So it's not two files updated versus one,
it's four versus three.

Also, given the small size of these tables, it's likely that most of the
updates occur in in-memory disk buffers. If you are running with fsync
on, nearly all the actual I/O per insert will be the write and fsync of
the WAL log. The time required for that is not going to be very
sensitive to the amount of data written, as long as it's much less than
one disk block per transaction, which will be true in both these cases.
You end up writing one block to the log per transaction anyway.

You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them). It'd also be educational to try it
with fsync disabled, or with id declared as plain int not serial.

regards, tom lane

#11Daniel Åkerud
zilch@home.se
In reply to: Tom Lane (#9)
Re: inserting, index and no index - speed

You might try running the ten thousand inserts as a single transaction
(do "begin" and "end" around them).

A HUGE difference (also completely took away the ID field (serial) having
only name):

Database vacuumed
pg: Trying 25000 inserts on index_with...
Time taken: 12 seconds

Database vacuumed
pg: Trying 25000 inserts on index_without...
Time taken: 12 seconds <--- MIGHT BE BACUASE OF FSYNC!? (fsyncing also for the previous run)

Database vacuumed
pg: Trying 30000 inserts on index_with...
Time taken: 15 seconds

Database vacuumed
pg: Trying 30000 inserts on index_without...
Time taken: 12 seconds

Database vacuumed
pg: Trying 35000 inserts on index_with...
Time taken: 21 seconds

Database vacuumed
pg: Trying 35000 inserts on index_without...
Time taken: 14 seconds

I can't believe what a difference that made. How can it make it faster by
putting it in a transaction? I thought that would make it slower. Like only
a 100th of the time.

Daniel Akerud

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Åkerud (#11)
Re: inserting, index and no index - speed

zilch@home.se writes:

I can't believe what a difference that made. How can it make it faster by
putting it in a transaction? I thought that would make it slower. Like only
a 100th of the time.

Everything is always a transaction in Postgres. If you don't say
begin/end, then there's an implicit begin and end around each individual
query. So your first set of tests were paying transaction commit
overhead for each insert.

regards, tom lane

#13Vivek Khera
khera@kcilink.com
In reply to: Tom Lane (#12)
Re: inserting, index and no index - speed

"TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
TL> query. So your first set of tests were paying transaction commit
TL> overhead for each insert.

This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: khera@kciLink.com Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

#14Alex Pilosov
alex@pilosoft.com
In reply to: Vivek Khera (#13)
Re: Re: inserting, index and no index - speed

On 10 Jun 2001, Vivek Khera wrote:

"TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual
TL> query. So your first set of tests were paying transaction commit
TL> overhead for each insert.

This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.

No, that's exactly how it is supposed to work, to guarantee that you will
never get same value from two separate calls to nextval.

-alex

#15Vivek Khera
khera@kcilink.com
In reply to: Alex Pilosov (#14)
Re: Re: inserting, index and no index - speed

"AP" == Alex Pilosov <alex@pilosoft.com> writes:

TL> Everything is always a transaction in Postgres. If you don't say
TL> begin/end, then there's an implicit begin and end around each individual

This doesn't seem to hold exactly for INSERTs involving sequences as
default values. Even if the insert fails for some other constraint,
the sequence is incremented.

AP> No, that's exactly how it is supposed to work, to guarantee that you will
AP> never get same value from two separate calls to nextval.

Even if your transaction fails? That seems to counter the definition
of a transaction that aborts; the state of the database is different
than before.

Or am I really thinking wrongly about what an aborted transaction
should leave behind?

#16Alex Pilosov
alex@pilosoft.com
In reply to: Vivek Khera (#15)
Re: Re: inserting, index and no index - speed

On Sun, 10 Jun 2001, Vivek Khera wrote:

Even if your transaction fails? That seems to counter the definition
of a transaction that aborts; the state of the database is different
than before.

Yes, except for the sequences.

Consider this example, transactions A, B, C, sequence S.

in A S.nextval = 1
in B S.nextval = 2
in C S.nextval = 3

transaction B then aborts, A and C succeed. Then, in your logic, nextval
of S should be 2, but really, to keep this kind of state, you need a table
listing 'currently unused values'. That, when your sequence gets to
millions, is a colossal waste of space.

If you want "maximum id that's not currently used in my table" use
max(id), if you want "give me a non-repeating number", use sequence.

There also are implications on concurrency when you use max(id), as only
one transaction can do it without danger of repeating IDs.

-alex