PostgreSQL Indexing versus MySQL

Started by Chris Fossenierabout 22 years ago9 messagesgeneral
Jump to latest
#1Chris Fossenier
chris@engenuit.com

I currently have a client with a database that must hold 125 million records
and all tallied about 250 fields.

The database has been normalized and indexed appropriately.

If any of you have worked with MySQL, you will have discovered that indexing
is very limited. You can only have one index file per table. The indexing
process actuallly creates a full copy of the original table and once you get
above 2 indexes with 125million records, it is extremely slow.

Should I even bother trying PostgreSQL to resolve this issue?

We can generate the same indexes in MS SQL and Oracle in a fraction of the
amount of time when held up to MySQL.

Thanks

Chris.

#2Adam Ruth
aruth@intercation.com
In reply to: Chris Fossenier (#1)
Re: PostgreSQL Indexing versus MySQL

I had this same problem not long ago, and it caused me go with
PostgreSQL over MySQL on a project. The problem you're describing does
not exist in PostgreSQL. Creating a new index on a table does not
require recreating the other indexes. MySQL also has the same problem
if you add or drop columns, all indexes need to be recreated. Again,
not a problem in PSQL. You'll find times similar to Oracle and MS SQL
(I've never directly compared them, but they feel about the same).

Regards,

Adam Ruth

On Feb 11, 2004, at 1:58 PM, Chris Fossenier wrote:

Show quoted text

I currently have a client with a database that must hold 125 million
records and all tallied about 250 fields.
 
The database has been normalized and indexed appropriately.
 
If any of you have worked with MySQL, you will have discovered that
indexing is very limited. You can only have one index file per table.
The indexing process actuallly creates a full copy of the original
table and once you get above 2 indexes with 125million records, it is
extremely slow.
 
Should I even bother trying PostgreSQL to resolve this issue?
 
We can generate the same indexes in MS SQL and Oracle in a fraction of
the amount of time when held up to MySQL.
 
Thanks
 
Chris.

#3Jan Wieck
JanWieck@Yahoo.com
In reply to: Chris Fossenier (#1)
Re: PostgreSQL Indexing versus MySQL

Chris Fossenier wrote:

I currently have a client with a database that must hold 125 million records
and all tallied about 250 fields.

The database has been normalized and indexed appropriately.

If any of you have worked with MySQL, you will have discovered that indexing
is very limited. You can only have one index file per table. The indexing
process actuallly creates a full copy of the original table and once you get
above 2 indexes with 125million records, it is extremely slow.

Should I even bother trying PostgreSQL to resolve this issue?

We can generate the same indexes in MS SQL and Oracle in a fraction of the
amount of time when held up to MySQL.

This is a known MySQL specific problem. I know of no other database that
handles index creation in such an inefficient way.

Creating an index in PostgreSQL requires scanning the entire main table
and sorting the resulting key set ... don't see how to do it better.
Every index is stored in its own (set of) file(s). During index
creation, an exclusive lock on the table is required by create index, so
reindexing your entire DB is not your preferred operation during peak
access times of your webserver. But I think professional DBA's don't
torture Oracle that way either.

Jan

Thanks

Chris.

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#4Chris Fossenier
chris@engenuit.com
In reply to: Jan Wieck (#3)
Re: PostgreSQL Indexing versus MySQL

Jan,

Thanks for the reply. Is there anyway to control your index file locations?
Can I place my Postgres indexes on different mount points?

I haven't put Postgres to the full test yet for importing/indexing but I was
concerned on the time it took to perform some sample queries that MySQL
seemed to handle more efficiently.

I would like view support but let me know if you have any insight into my
first question.

Thanks for the reply.

Chris.

-----Original Message-----
From: Jan Wieck [mailto:JanWieck@Yahoo.com]
Sent: Sunday, February 15, 2004 4:23 PM
To: Chris Fossenier
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL Indexing versus MySQL

Chris Fossenier wrote:

I currently have a client with a database that must hold 125 million
records and all tallied about 250 fields.

The database has been normalized and indexed appropriately.

If any of you have worked with MySQL, you will have discovered that
indexing is very limited. You can only have one index file per table.
The indexing process actuallly creates a full copy of the original
table and once you get above 2 indexes with 125million records, it is
extremely slow.

Should I even bother trying PostgreSQL to resolve this issue?

We can generate the same indexes in MS SQL and Oracle in a fraction of
the amount of time when held up to MySQL.

This is a known MySQL specific problem. I know of no other database that
handles index creation in such an inefficient way.

Creating an index in PostgreSQL requires scanning the entire main table
and sorting the resulting key set ... don't see how to do it better.
Every index is stored in its own (set of) file(s). During index
creation, an exclusive lock on the table is required by create index, so
reindexing your entire DB is not your preferred operation during peak
access times of your webserver. But I think professional DBA's don't
torture Oracle that way either.

Jan

Thanks

Chris.

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#5scott.marlowe
scott.marlowe@ihs.com
In reply to: Chris Fossenier (#4)
Re: PostgreSQL Indexing versus MySQL

On Sun, 15 Feb 2004, Chris Fossenier wrote:

Jan,

Thanks for the reply. Is there anyway to control your index file locations?
Can I place my Postgres indexes on different mount points?

Yes, but there's no clean SQL command line interface to do it. You have
to find the oid of the index, shut down the database, move the index, link
to it with 'ln -s' and restart the database. Should you reindex, or
recreate the index, you'll have to remember to go back and do this all
over again for the new indexes OID.

I haven't put Postgres to the full test yet for importing/indexing but I was
concerned on the time it took to perform some sample queries that MySQL
seemed to handle more efficiently.

Five (possible) issues here.

1: Postgresql is built to handle massive parallel load. If you're
testing a single thread, MySQL will often win by a fair margin. Put a
real load on it, say 10% write, 90 read, with 100 clients, and you'll see
a win the favor of Postgresql.

2: Postgresql has a really need feature that allows users to define their
own types. Yeah, cool. But, it means that the parser is not much
smarter about coercing an int4 to an int8 than it is about coercing a
custom type (hex, foobar, etc...) from one to another. What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan. Why? Because the default integer type
is int4, and your id field is int8. Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

3: You might not have a representative data set. Testing postgresql with
1,000 rows when you're really gonna have 1,000,000 is not good either. It
behaves differently for differently sized data sets because things that
are cheap on small data sets might be expensive on large ones, and vice
versa (though not usually the vice versa so much.)

4: You haven't run analyze and / or vacuum full and /or havn't been
running regular vacuums enough and / or don't have enough fsm slots set up
to handle the amount of tuples you are turning over if you are running
regular vacuums enough.

5: You haven't tuned your installation for the hardware it's on.
Postgresql tends to favor I/O over CPU, and believes it is being installed
on a P100 with 64 Meg of ram if not told otherwise. The reason is
obvious, if you install a database tuned for a small computer on a big
controller, it may not be real fast, but it will run. If you install a
database tuned for a monster machine on a small machine the database may
not start.

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

has some real good information on performance tuning.

I would like view support but let me know if you have any insight into my
first question.

Views have been fully featured since around 7.1 or so, a couple years
back:

http://www.postgresql.org/docs/7.3/static/sql-createview.html

#6Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: scott.marlowe (#5)
Re: PostgreSQL Indexing versus MySQL

At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:

custom type (hex, foobar, etc...) from one to another. What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan. Why? Because the default integer type
is int4, and your id field is int8. Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

Actually won't
select * from test where id='456'
use the index?

I'm curious if this work in all cases - e.g. postgresql figures the best
cast for text to whatever, even for relevant custom types?

#7Jan Wieck
JanWieck@Yahoo.com
In reply to: Lincoln Yeoh (#6)
Re: PostgreSQL Indexing versus MySQL

Lincoln Yeoh wrote:

At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:

custom type (hex, foobar, etc...) from one to another. What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan. Why? Because the default integer type
is int4, and your id field is int8. Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

Actually won't
select * from test where id='456'
use the index?

I'm curious if this work in all cases - e.g. postgresql figures the best
cast for text to whatever, even for relevant custom types?

'456' is not text, it is a quoted literal of (yet) unknown type and kept
that way for long enough to know that it should be an int8, ideally.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#8scott.marlowe
scott.marlowe@ihs.com
In reply to: Lincoln Yeoh (#6)
Re: PostgreSQL Indexing versus MySQL

On Wed, 18 Feb 2004, Lincoln Yeoh wrote:

At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:

custom type (hex, foobar, etc...) from one to another. What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan. Why? Because the default integer type
is int4, and your id field is int8. Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

Actually won't
select * from test where id='456'
use the index?

I'm curious if this work in all cases - e.g. postgresql figures the best
cast for text to whatever, even for relevant custom types?

It works, I just like writing things in the most self documenting manner
possible, since one day somebody else may look at:

select * from test where id='456'

and go, "hey, that's just an int, no need for the quotes" and take them
out not knowing what they do. cast(456 as int8) is pretty obvious, '456'
is much more subtle.

#9Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: scott.marlowe (#8)
Re: PostgreSQL Indexing versus MySQL

At 10:35 AM 2/18/2004 -0700, scott.marlowe wrote:

It works, I just like writing things in the most self documenting manner
possible, since one day somebody else may look at:

select * from test where id='456'

and go, "hey, that's just an int, no need for the quotes" and take them
out not knowing what they do. cast(456 as int8) is pretty obvious, '456'
is much more subtle.

Well whenever I see:

select * from test where id=$var

I think "possible SQL injection".

I'd rather see bind variables be used or something similar, and when you
use those, the quotes are automatic.

With your method if the table is altered to int4 or something else, it
won't use the index (unless a postgresql dev adds stuff).

'456'::int8 seems to work too. But I suppose that's a Postgresql-ism.