prelimiary performance comparison pgsql vs mysql
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.
These results are for a single process populating a table with 934k rows,
and then performing some selects. I also compared the effect of creating
indexes on some of the columns.
I have not yet done any testing of transactions, multiple concurrent
processes, etc.
I did not make any changes to the default config settings. I can do
so if someone has some suggestions.
My machine is a 3.0 GHz P4 with 1 GB ram, running FC 3.
I used pg 8.0.1 and mysql 5.0.2 alpha.
I compiled pg from source, but I downloaded an binary for mysql. If
someone thinks this is significant, and can point me to a good
binary for pg, I will give it a try.
All timings are as reported by the db.
I included the pg script below..
Finally, I don't have an axe to grind either way. I'm trying to be
fair, but I am the first to admit I am not an expert in db tuning.
I welcome constructive comments and advice.
**************** data and queries
The data comes from some experimental data I have been working with.
I load data into a table with relavant columns fid, rid, x.
The combination of fid,rid is unique.
x is a location, and is not unique.
I loaded the data using COPY (pg) and LOAD (mysql).
The queries were:
select count(*) from data where fid=2 and rid=6; count = 100
select count(*) from data where x > 5000 and x < 5500; count = 35986
select count(*) from data where x > 5000 and x < 5020; count = 1525
***************** preliminary conclusions
As suspected, MyISAM is very fast. In the tested case (only
one process, only one big load and some selects) MyISAM tables are
much faster than pg or InnoDB.
For queries, InnoDB and pg are roughly equivalent. In some cases
one or the other is a little faster, but they are mostly in the
same ballpark. The one exception seems to be that pg has an edge
in seq scans.
pg is slower loading data when it has to create an index. Also,
I found that is is critical to run "vacuum analyze" in pg. Running
"analyze" in mysql did not seem to make much difference. I'm guessing
that mysql builds statistics while it is loading data, and does not
actually run an analyze since the table has not changed.
******************* preliminary results ***************************
******************* all times in seconds **************************
note: input table has 934500 rows.
mysql 5.0.2 alpha
PG 8.0.1 MyISAM InnoDB
NO INDEXES
Load file 22.3 3.9 22.1
select count fid=?,rid=? 3.0 0.23 2.07
select count x > 5000, x < 5500 1.2 0.27 1.59
select count x > 5000, x < 5020 0.63 0.29 1.58
INDEXES on (fid,rid)
Load file 36. 13.5 30.1
vacuum analyze 3.6
select count fid=?,rid=? 0.0 0.00 0.02
select count x > 5000, x < 5500 0.702 0.29 2.07
select count x > 5000, x < 5020 0.713 0.28 1.59
INDEXES on (fid,rid) and (x)
Load file 202. 24. 151.
vacuum analyze 11.
select count fid=?,rid=? 0.002 0.00 0.02
select count x > 5000, x < 5500 0.9 0.06 0.75
select count x > 5000, x < 5020 0.048 0.01 0.01
********************* PG-SQL script ************************
\timing
--
-- Load table, no indexes
--
drop table data cascade;
create table data (
fid integer,
rid integer,
range real,
x real,
y real,
z real,
bs real,
snr real,
rvel real,
cfar smallint);
COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;
--
-- Load table, index on (fid,rid)
--
drop table data cascade;
create table data (
fid integer,
rid integer,
range real,
x real,
y real,
z real,
bs real,
snr real,
rvel real,
cfar smallint);
create index fidrid_data on data (fid,rid);
COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
vacuum analyze data;
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;
--
-- Load table, index on (fid,rid) and (x)
--
drop table data cascade;
create table data (
fid integer,
rid integer,
range real,
x real,
y real,
z real,
bs real,
snr real,
rvel real,
cfar smallint);
create index fidrid_data on data (fid,rid);
create index fidx on data (x);
COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
vacuum analyze data;
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;
Hi Rick,
the work you are doing is important (at least I think so).
From my experience PosgreSQL performance is also very slow in case
there are several LEFT JOINs and there are varchar() fields. You can see
an example in archive where my problem is described (Subject: "How to
read query plan"). There is a patch that partially solves this problem
(as Tom Lane mentioned) but it is not in the latest PostgreSQL release.
I will work on improving of the design of tables that I have problem
with to see if there will be any significant speed improvements. Maybe
you could use some of my results for some of your tests then.
Miroslav
Rick Schumeyer wrote:
Show quoted text
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql....
Rick Schumeyer wrote:
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.These results are for a single process populating a table with 934k rows,
and then performing some selects. I also compared the effect of creating
indexes on some of the columns.I have not yet done any testing of transactions, multiple concurrent
processes, etc.I did not make any changes to the default config settings. I can do
so if someone has some suggestions.My machine is a 3.0 GHz P4 with 1 GB ram, running FC 3.
Stop now. I've not looked at your test results, and frankly there is no
point. As it ships, PG should run fine on a small corner of an old
laptop. It will not perform well with any sort of serious workload on
any sort of serious hardware. You're wasting your time if you want to
get any sort of meaningful result.
Take 30 minutes to read through the article below. It covers the basics
of how to manage your configuration settings.
http://www.powerpostgresql.com/PerfList
Oh - make sure you are accounting for caching effects as well.
--
Richard Huxton
Archonet Ltd
On Mon, Mar 14, 2005 at 06:52:58AM -0500, Rick Schumeyer wrote:
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.
...
I have not yet done any testing of transactions, multiple concurrent
processes, etc.
I would say that doing the concurrency tests is probably the most
important factor in comparing other databases against MySQL, as
MySQL will almost always win in single-user tests.
E.g. here are some performance figures from tests I have done in the past.
This is with a 6GB databse on a 4CPU Itanium system running a mixture of
read-only queries, but it is fairly typical of the behaviour I have seen.
The Oracle figures also scaled in a similar way to postgres.
Clients 1 2 3 4 6 8 12 16 32 64 128
-------------------------------------------------------------------------------
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 0.65 1.27 1.90 2.48 2.45 2.50 2.48 2.51 2.49 2.39 2.38
-Mark
In article <010001c5288c$5e3b3c40$0200a8c0@dell8200>,
"Rick Schumeyer" <rschumeyer@ieee.org> writes:
These results are for a single process populating a table with 934k rows,
and then performing some selects. I also compared the effect of creating
indexes on some of the columns.
I have not yet done any testing of transactions, multiple concurrent
processes, etc.
Bad. That's where things begin to get interesting.
I did not make any changes to the default config settings.
Bad. On modern hardware MySQL performs quite good with its default
settings; PostgreSQL performs horribly without some tuning.
I used pg 8.0.1 and mysql 5.0.2 alpha.
Bad. As you noticed, MySQL 5.x is Alpha and not very stable. I'd
suggest using MySQL 4.1.10 instead.
I compiled pg from source, but I downloaded an binary for mysql.
Good. Since MySQL is multithreaded, it's much harder to compile than
PostgreSQL. The MySQL guys actually recommend using their binaries.
select count(*) from data where fid=2 and rid=6; count = 100
select count(*) from data where x > 5000 and x < 5500; count = 35986
select count(*) from data where x > 5000 and x < 5020; count = 1525
Bad. These queries are exactly the sore point of PostgreSQL and
MySQL/InnoDB, whereas MySQL/MyISAM really shines.
That site produces some sort of php error.
I don't suppose this information is available elsewhere?
Show quoted text
Stop now. I've not looked at your test results, and frankly there is no
point. As it ships, PG should run fine on a small corner of an old
laptop. It will not perform well with any sort of serious workload on
any sort of serious hardware. You're wasting your time if you want to
get any sort of meaningful result.Take 30 minutes to read through the article below. It covers the basics
of how to manage your configuration settings.
http://www.powerpostgresql.com/PerfList
Rick Schumeyer wrote:
That site produces some sort of php error.
Hmm - was working this morning. Perhaps some maintenance going on.
I don't suppose this information is available elsewhere?
Try some slightly older notes here:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
--
Richard Huxton
Archonet Ltd
On Mon, 2005-03-14 at 05:52, Rick Schumeyer wrote:
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.These results are for a single process populating a table with 934k rows,
and then performing some selects. I also compared the effect of creating
indexes on some of the columns.I have not yet done any testing of transactions, multiple concurrent
processes, etc.I did not make any changes to the default config settings. I can do
so if someone has some suggestions.My machine is a 3.0 GHz P4 with 1 GB ram, running FC 3.
I used pg 8.0.1 and mysql 5.0.2 alpha.
Why are all the tests here select count(*) tests? Surely your
application does something more interesting than counting rows...
For a more interesting test, try setting up three or four streaming
writers that write information continuously into the database, and then
run the select count(*) queries against both and see what happens.
My guess is that the table level locking of myisam tables means the
MySQL database will slow to a crawl or throw error messages, while the
postgresql system will slow down somewhat but keep right on running.
Richard Huxton <dev@archonet.com> writes:
Rick Schumeyer wrote:
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.
Take 30 minutes to read through the article below. It covers the basics
of how to manage your configuration settings.
http://www.powerpostgresql.com/PerfList
I have been fooling with the sql-bench stuff that MySQL ships with their
database. Not because I take it seriously ;-) but because I thought it
would be useful to understand in detail why we look so spectacularly bad
on it. I'll write a more complete report when I'm done, but what's
relevant to Rick's testing is that I have found that a few simple
configuration adjustments make a huge difference. Specifically,
I've got
shared_buffers = 10000 # 10x the default
checkpoint_segments = 30 # 10x the default
work_mem = 100000 # ~100x the default
maintenance_work_mem = 100000 # ~6x the default
(The *work_mem numbers are probably excessive but I've not bothered to
fine-tune them.) A stock out-of-the-box PG 8.0.1 RPM is about 10x slower
overall than MySQL according to this benchmark, but these adjustments
bring it to something like 2x slower. Which is at least in the ballpark.
Most of the tables that this benchmark uses have about 300K
not-super-wide rows, so what this says is that you need numbers in this
vicinity to work on tables of that size.
Bottom line is that you *must* adjust at least these settings if you
want a high-performance PG server.
regards, tom lane
mrae@purplebat.com (Mark Rae) writes:
On Mon, Mar 14, 2005 at 06:52:58AM -0500, Rick Schumeyer wrote:
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.
...
I have not yet done any testing of transactions, multiple concurrent
processes, etc.I would say that doing the concurrency tests is probably the most
important factor in comparing other databases against MySQL, as
MySQL will almost always win in single-user tests.E.g. here are some performance figures from tests I have done in the past.
This is with a 6GB databse on a 4CPU Itanium system running a mixture of
read-only queries, but it is fairly typical of the behaviour I have seen.
The Oracle figures also scaled in a similar way to postgres.Clients 1 2 3 4 6 8 12 16 32 64 128
-------------------------------------------------------------------------------
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 0.65 1.27 1.90 2.48 2.45 2.50 2.48 2.51 2.49 2.39 2.38
Could you elaborate on what the measures are here? I don't quite
follow what "0.8" means as compared to "2.38."
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.
On Mon, Mar 14, 2005 at 06:04:01PM -0500, Chris Browne wrote:
mrae@purplebat.com (Mark Rae) writes:
Clients 1 2 3 4 6 8 12 16 32 64 128
------------------------------------------------------------------------------
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 0.65 1.27 1.90 2.48 2.45 2.50 2.48 2.51 2.49 2.39 2.38Could you elaborate on what the measures are here? I don't quite
follow what "0.8" means as compared to "2.38."
The figures are generated from a representative set of ~500 queries which
each client issues in a random order. The clients all connect in parallel
and the elapsed time taken for them all to complete is recorded.
The numbers given above are these times converted into relative throughput
figures. The baseline being a single mysql client, a performance of
1.00 is equivalent to an average of 5.82 queries per second.
i.e.
with a single client postgres runs at 65% the speed of mysql
by the time 3 clients are running, postgres is getting through the
queries 1.90/1.34=1.42 times faster
and once we get to 32 clients, mysql is tripping over itself and is
actually running slower than if the clients ran one after another.
Looking at how the database scales,
i.e. all figures are relative to the speed of a single client connection.
Clients 1 2 3 4 6 8 12 16 32 64 128
------------------------------------------------------------------------------
Theoretical 1.00 2.00 3.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 1.00 1.94 2.91 3.79 3.75 3.82 3.80 3.84 3.82 3.66 3.64
The theoretical line being is how much we should expect things to scale
given that this is a 4 cpu machine.
-Mark
Le mardi 15 mars 2005 ᅵ 12:26 +0000, Mark Rae a ᅵcrit :
Clients 1 2 3 4 6 8 12 16 32 64 128
------------------------------------------------------------------------------
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 0.65 1.27 1.90 2.48 2.45 2.50 2.48 2.51 2.49 2.39 2.38
by the time 3 clients are running, postgres is getting through the
queries 1.90/1.34=1.42 times faster
That is very interesting!!!
I have several webapps on my server each one opens several queries to
the database from _each_ JSP - often more than three... So the hunch
that I had all along was right: PostgreSQL is a much better back end for
Tomcat/JSP than MySQL.
Cheers
Tony
Mark Rae wrote:
I would say that doing the concurrency tests is probably the most
important factor in comparing other databases against MySQL, as
MySQL will almost always win in single-user tests.E.g. here are some performance figures from tests I have done in the past.
This is with a 6GB databse on a 4CPU Itanium system running a mixture of
read-only queries, but it is fairly typical of the behaviour I have seen.
The Oracle figures also scaled in a similar way to postgres.Clients 1 2 3 4 6 8 12 16 32 64 128
-------------------------------------------------------------------------------
mysql-4.1.1 1.00 1.41 1.34 1.16 0.93 1.03 1.01 1.00 0.94 0.86 0.80
pg-7.4.1 0.65 1.27 1.90 2.48 2.45 2.50 2.48 2.51 2.49 2.39 2.38
Would be interesting to know about the tuning of the MySQL, I guess that
buffers for indexing and sort is well setup, but what about thread
caching? Knowing that will once in a while you will have a connection
burst you can tell mysql to cache thread so that it can save time next
time it needs them.
--
Robin Ericsson
http://robin.vill.ha.kuddkrig.nu/
On Tue, 2005-03-15 at 14:07 +0100, tony wrote:
by the time 3 clients are running, postgres is getting through the
queries 1.90/1.34=1.42 times fasterThat is very interesting!!!
I have several webapps on my server each one opens several queries to
the database from _each_ JSP - often more than three... So the hunch
that I had all along was right: PostgreSQL is a much better back end for
Tomcat/JSP than MySQL.
Be careful assuming that. DB benchmarks are hard to do in a general
sense. His results probably indicate a general trend, but you should
test your application yourself to get a real result. His pattern of SQL
queries might be very different from yours.
Regards,
Jeff Davis
On Mon, 2005-03-14 at 12:43 +0000, Richard Huxton wrote:
Take 30 minutes to read through the article below. It covers the basics
of how to manage your configuration settings.
http://www.powerpostgresql.com/PerfList
That's an informative article. I was hoping, however, that it would have
a few details about the effects of the statistics settings on
performance. Which statistics options affect the planner? Do they
potentially affect autovacuum?
Also, a 32-bit machine can only hold so much RAM. If I'm correct, there
are ways to address more memory than that on a 32 bit machine, but I
wonder at what cost? In other words, is it a good idea to address more
than 4GB on a 32 bit machine? If not, is it a reasonable choice to
invest in 64 bit if you want >4GB of RAM? Or are you better off just
spending the money on RAID and staying at 4GB?
Regards,
Jeff Davis
PS: A minor typo in "Disk and WAL" -> "checkpoint_segments":
s/Depening/Depending/
Also, a 32-bit machine can only hold so much RAM. If I'm correct, there
are ways to address more memory than that on a 32 bit machine, but I
wonder at what cost? In other words, is it a good idea to address more
than 4GB on a 32 bit machine? If not, is it a reasonable choice to
invest in 64 bit if you want >4GB of RAM? Or are you better off just
spending the money on RAID and staying at 4GB?
It entirely depends on the database but not that the 32bit limit of 4GB
is per CPU. So if you have 4 CPUs you can have 16GB of ram.
However, you should be running Opterons anyway.
J
Regards,
Jeff DavisPS: A minor typo in "Disk and WAL" -> "checkpoint_segments":
s/Depening/Depending/---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
--
Command Prompt, Inc., Your PostgreSQL solutions company. 503-667-4564
Custom programming, 24x7 support, managed services, and hosting
Open Source Authors: plPHP, pgManage, Co-Authors: plPerlNG
Reliable replication, Mammoth Replicator - http://www.commandprompt.com/
On Tue, Mar 15, 2005 at 12:51:03PM -0800, Jeff Davis wrote:
Be careful assuming that. DB benchmarks are hard to do in a general
sense. His results probably indicate a general trend, but you should
test your application yourself to get a real result. His pattern of SQL
queries might be very different from yours.
Very true.
You may have noticed that I had a very low query rate of 5.8 queries
per second, because some of the queries have 12 tables to join and
take about 20s to run. This tends to work in postgres' favour.
If you have lots have simple queries, it will be better for mysql
and the break even point will be higher.
Also, while on the subject of scaling. I had the opportunity
to try postgres on a 16CPU Altix and couldn't get it to scale
more than about 4x, whereas Oracle got up to about 12x faster
I assume this is because of the NUMA architecture. I was also
told that Oracle had made no special optimizations to accomodate it.
My guess is that because postgres allocates all its shared
buffers as a contiguous chunk, it puts all the load on one
memory bank.
Oracle on the other hand, seems to use lots of smaller regions
which would probably be spread throughout the physical memory.
Perhaps one of the developers could comment on how difficult
it would be to change the shared buffer handling to use multiple
segments. As I'd definitely be willing to give it a go.
-Mark
Mark Rae wrote:
On Tue, Mar 15, 2005 at 12:51:03PM -0800, Jeff Davis wrote:
Be careful assuming that. DB benchmarks are hard to do in a general
sense. His results probably indicate a general trend, but you should
test your application yourself to get a real result. His pattern of SQL
queries might be very different from yours.Very true.
You may have noticed that I had a very low query rate of 5.8 queries
per second, because some of the queries have 12 tables to join and
take about 20s to run. This tends to work in postgres' favour.
If you have lots have simple queries, it will be better for mysql
and the break even point will be higher.Also, while on the subject of scaling. I had the opportunity
to try postgres on a 16CPU Altix and couldn't get it to scale
more than about 4x, whereas Oracle got up to about 12x fasterI assume this is because of the NUMA architecture. I was also
told that Oracle had made no special optimizations to accomodate it.My guess is that because postgres allocates all its shared
buffers as a contiguous chunk, it puts all the load on one
memory bank.
Oracle on the other hand, seems to use lots of smaller regions
which would probably be spread throughout the physical memory.Perhaps one of the developers could comment on how difficult
it would be to change the shared buffer handling to use multiple
segments. As I'd definitely be willing to give it a go.
We have had some major SMP improvements in current CVS. Were you
testing that or 8.0.X?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Mar 15, 2005 at 06:46:50PM -0500, Bruce Momjian wrote:
Mark Rae wrote:
Also, while on the subject of scaling. I had the opportunity
to try postgres on a 16CPU Altix and couldn't get it to scale
more than about 4x, whereas Oracle got up to about 12x fasterWe have had some major SMP improvements in current CVS. Were you
testing that or 8.0.X?
It tried it with 8.0.0rc3, and had previously tried a 7.4 version
-Mark
Mark Rae wrote:
On Tue, Mar 15, 2005 at 06:46:50PM -0500, Bruce Momjian wrote:
Mark Rae wrote:
Also, while on the subject of scaling. I had the opportunity
to try postgres on a 16CPU Altix and couldn't get it to scale
more than about 4x, whereas Oracle got up to about 12x fasterWe have had some major SMP improvements in current CVS. Were you
testing that or 8.0.X?It tried it with 8.0.0rc3, and had previously tried a 7.4 version
Oh, you have to try CVS HEAD or a nightly snapshot. Tom made a major
change that allows scaling in SMP environments.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073