Article on MySQL vs. Postgres

Started by Tim Perduealmost 26 years ago50 messageshackers
Jump to latest
#1Tim Perdue
tperdue@valinux.com

On wednesday or thursday, I'm going to be publishing my article on MySQL
vs. Postgres on PHPBuilder.com.

Before I do that I want to confirm the major problem I had w/postgres:
the 8K tuple limit. When trying to import some tables from MySQL,
postgres kept choking because MySQL has no such limit on the size of a
row in the database (text fields on MySQL can be multi-megabyte).

Is it even possible to import large text fields into postgres? If not,
how in the world can anyone use this to store message board posts,
resumes, etc? Do you have to use pgsql-specific large-object
import/export commands?

I actually intended the article to be a win for Postgres, as I've used
it and had good luck with it for such a long time, but if you look at
the results below, it seems very positive for MySQL.

Performace/Scalability:

MySQL was About 50-60% faster in real-world web serving, but it crumbles
under a real load. Postgres on the other hand scaled 3x higher than
MySQL before it started to crumble on the same machine. Unfortunately,
Postgres would probably still lose on a high-traffic website because
MySQL can crank out the pages so much faster, number of concurrent
connections is hard to compare. MySQL also seems to make better use of
multiple-processor machines like the quad-xeon I tested on. Postgres
never saturated all 4 processors as MySQL did.

Tools:
MySQL has some nice admin tools that allow you to watch individual
connections and queries as they progress and tools to recover from
corruption. I haven't seem any similar tools for postgres.

Long-term stability:
Postgres is undoubtably the long-run winner in stability, whereas MySQL
will freak out or die when left running for more than a month at a time.
But if you ever do have a problem with postgres, you generally have to
nuke the database and recover from a backup, as there are no known tools
to fix index and database corruption. For a long-running postgres
database, you will occasionally have to drop indexes and re-create them,
causing downtime.

Usability:
Both databases use a similar command-line interface. Postgres uses
"slash commands" to help you view database structures. MySQL uses a more
memorable, uniform syntax like "Show Tables; Show Databases; Describe
table_x;" and has better support for altering/changing tables, columns,
and even databases.

Features:
Postgres is undoubtedly far, far more advanced than MySQL is. Postgres
now supports foreign keys, which can help with referential integrity.
Postgres supports subselects and better support for creating tables as
the result of queries. The "transaction" support that MySQL lacks is
included in Postgres, although you'll never miss it on a website, unless
you're building something for a bank, and if you're doing that, you'll
use oracle.

Tim

--
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723

#2Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Tim Perdue (#1)
RE: Article on MySQL vs. Postgres

Before I do that I want to confirm the major problem I had w/postgres:
the 8K tuple limit. When trying to import some tables from MySQL,
postgres kept choking because MySQL has no such limit on the size of a
row in the database (text fields on MySQL can be multi-megabyte).

Jan is working on TOAST for 7.1 - there will be no 8K limit any more...

Long-term stability:
Postgres is undoubtably the long-run winner in stability,
whereas MySQL will freak out or die when left running for more than a

month

at a time. But if you ever do have a problem with postgres, you generally
have to nuke the database and recover from a backup, as there are no
known tools to fix index and database corruption. For a long-running

postgres

database, you will occasionally have to drop indexes and re-create them,
causing downtime.

I'm implementing WAL for 7.1 - there will be true after crash recovery...

Vadim

#3Egon Schmid (@work)
eschmid@php.net
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

On wednesday or thursday, I'm going to be publishing my article on MySQL
vs. Postgres on PHPBuilder.com.

Cool!

Features:
Postgres is undoubtedly far, far more advanced than MySQL is. Postgres
now supports foreign keys, which can help with referential integrity.
Postgres supports subselects and better support for creating tables as
the result of queries. The "transaction" support that MySQL lacks is
included in Postgres, although you'll never miss it on a website, unless
you're building something for a bank, and if you're doing that, you'll
use oracle.

Since MySQL version 3.23.16 it supports transactions with sleepycats DB3
and since version 3.23.19 it is under the GPL.

-Egon

--
SIX Offene Systeme GmbH � Stuttgart - Berlin - New York
Sielminger Stra�e 63 � D-70771 Leinfelden-Echterdingen
Fon +49 711 9909164 � Fax +49 711 9909199 http://www.six.de
PHP-Stand auf Europas gr�sster Linux-Messe: 'LinuxTag 2001'
weitere Infos @ http://www.dynamic-webpages.de/

#4'JanWieck@t-online.de'
JanWieck@t-online.de
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

On wednesday or thursday, I'm going to be publishing my article on MySQL
vs. Postgres on PHPBuilder.com.

Before I do that I want to confirm the major problem I had w/postgres:
the 8K tuple limit. When trying to import some tables from MySQL,
postgres kept choking because MySQL has no such limit on the size of a
row in the database (text fields on MySQL can be multi-megabyte).

I just committed the first portion of TOAST. Enabling lztext
fields to hold multi-megabytes too. But it's not the answer
to such big objects. I have plans to add an Oracle like
large object handling in a future version.

I actually intended the article to be a win for Postgres, as I've used
it and had good luck with it for such a long time, but if you look at
the results below, it seems very positive for MySQL.

It's never a good plan to have an initial intention which of
the competitors should finally look good. It's visible
between the lines.

Performace/Scalability:

MySQL was About 50-60% faster in real-world web serving, but it crumbles
under a real load. Postgres on the other hand scaled 3x higher than
MySQL before it started to crumble on the same machine. Unfortunately,
Postgres would probably still lose on a high-traffic website because
MySQL can crank out the pages so much faster, number of concurrent
connections is hard to compare. MySQL also seems to make better use of
multiple-processor machines like the quad-xeon I tested on. Postgres
never saturated all 4 processors as MySQL did.

The question in this case is "what is real-world web
serving"? To spit out static HTML pages loaded into a
database? To handle discussion forums like OpenACS with high
concurrency and the need for transactions?

Web applications differ in database usage as much as any
other type of application. From huge amounts of static, never
changing data to complex data structures with many
dependencies constantly in motion. There is no such one
"real world web scenario".

Tools:
MySQL has some nice admin tools that allow you to watch individual
connections and queries as they progress and tools to recover from
corruption. I haven't seem any similar tools for postgres.

Yepp, we need alot more nice tools.

Long-term stability:
Postgres is undoubtably the long-run winner in stability, whereas MySQL
will freak out or die when left running for more than a month at a time.
But if you ever do have a problem with postgres, you generally have to
nuke the database and recover from a backup, as there are no known tools
to fix index and database corruption. For a long-running postgres
database, you will occasionally have to drop indexes and re-create them,
causing downtime.

Not true IMHO. We had some problems with indices in the past.
But you can drop/recreate them online and someone running a
query concurrently might just use a sequential scan during
that time. All other corruptions need backup and recovery.
WAL is on it's way.

Usability:
Both databases use a similar command-line interface. Postgres uses
"slash commands" to help you view database structures. MySQL uses a more
memorable, uniform syntax like "Show Tables; Show Databases; Describe
table_x;" and has better support for altering/changing tables, columns,
and even databases.

Since professional application development starts with a data
design, such "describe" commands and "alter" features are
unimportant. The more someone needs them, the more I know
that he isn't well educated.

Productional installations don't need any "alter" command at
all. New features are developed in the development area,
tested with real life data in the test environment and moved
to the production server including a maybe required data
conversion step during a downtime.

24/7 scenarios require hot standby, online synchronized
databases with hardware takeover. All that is far away from
our scope by now.

Features:
Postgres is undoubtedly far, far more advanced than MySQL is. Postgres
now supports foreign keys, which can help with referential integrity.
Postgres supports subselects and better support for creating tables as
the result of queries. The "transaction" support that MySQL lacks is
included in Postgres, although you'll never miss it on a website, unless
you're building something for a bank, and if you're doing that, you'll
use oracle.

FOREIGN KEY doesn't help with referential integrity, it
guarantees it. No application must ever worry if it will
find the customer when it has a problem report. It does a
SELECT and has it or it would've never found the problem
report first - period.

And for big, functional expanding web sites, it does so even
if one of a dozen programmers forgot it once. If the
constraint says you cannot delete a customer who payed until
end of the year, the database won't let you, even if one of
the 7 CGI programs that can delete customers doesn't check.

Transactions are the base for any data integrity. Especially
in the web environment. Almost every web server I've seen has
some timeout for CGI, ADP, ASP or whatever they call it. As
soon as your page needs to update more than one table, you
run the risk of getting aborted just between, leaving the
current activity half done. No matter if a database supports
FOREIGN KEY. I could live without it, but transactions are
essential.

Fortunately the MySQL team has changed it's point of view on
that detail and made some noticeable advantage into that area
by integrating BDB. The lates BETA does support transactions
including rollback as they announced. As far as I see it, the
integration of BDB only buys them transactions, on the cost
of performance and maintainence efford. So the need for it
cannot be that small as you think.

Final notes:

I hate these "MySQL" vs. "PostgreSQL" articles that want to
say "this one is the better". Each one has it's advantages
and disadvantages. Both have a long TODO.

Your article might better analyze a couple of different
"real-world web services", telling what DB usage profile they
have and then suggesting which of the two databases is the
better choice in each case.

MySQL is a tool and PostgreSQL is a tool. But as with other
tools, a hammer doesn't help if you need a screw driver.

Please don't intend to tell anyone either of these databases
is "the best". You'd do both communities a bad job. Help
people to choose the right database for their current needs
and tell them to reevaluate their choice for the next project
instead of blindly staying with the same database. We'll end
up with alot of customers using both databases parallel for
different needs.

At the bottom line both teams share the same idea, open
source. Anyone who pays a license fee is a loss (looser?) for
all of us.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#5Benjamin Adida
ben@mit.edu
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

on 7/4/00 3:42 PM, Tim Perdue at tperdue@valinux.com wrote:

Before I do that I want to confirm the major problem I had w/postgres:
the 8K tuple limit. When trying to import some tables from MySQL,
postgres kept choking because MySQL has no such limit on the size of a
row in the database (text fields on MySQL can be multi-megabyte).

It's possible in the current version to up your tuple limit to 16K before
compilation, and you can use lztext, the compressed text type, which should
give you up to 32K of storage. Netscape's textarea limit is 32K, so that's a
good basis for doing a number of web-based things. Anything that is
multi-megabyte is really not something I'd want to store in an RDBMS.

I actually intended the article to be a win for Postgres, as I've used
it and had good luck with it for such a long time, but if you look at
the results below, it seems very positive for MySQL.

Jan said that each tool has its value, and that's true. I recommend you
define your evaluation context before you write this. Is this for running a
serious mission-critical web site? Is it for logging web site hits with
tolerance for data loss and a need for doing simple reporting?

Performace/Scalability:

MySQL was About 50-60% faster in real-world web serving, but it crumbles
under a real load. Postgres on the other hand scaled 3x higher than
MySQL before it started to crumble on the same machine. Unfortunately,
Postgres would probably still lose on a high-traffic website because
MySQL can crank out the pages so much faster, number of concurrent
connections is hard to compare. MySQL also seems to make better use of
multiple-processor machines like the quad-xeon I tested on. Postgres
never saturated all 4 processors as MySQL did.

What kind of queries did you perform? Did you use connection pooling (a lot
of PHP apps don't, from what I've seen)? How does the performance get
affected when a query in Postgres with subselects has to be split into 4
different queries in MySQL? Postgres is process-based, each connection
resulting in one process. If you use connection pooling with at least as
many connections as you have processors, you should see it scale quite well.
In fact, for serious load-testing, you should have 10-15 pooled connections.

I *strongly* question your intuition on Postgres running web sites. MySQL's
write performance is very poor, which forces excessive caching (see sites
like Slashdot) to prevent updates from blocking entire web site serving.
Yes, the BDB addition might be useful. Let's see some performance tests
using BDB tables.

Postgres is undoubtably the long-run winner in stability, whereas MySQL
will freak out or die when left running for more than a month at a time.
But if you ever do have a problem with postgres, you generally have to
nuke the database and recover from a backup, as there are no known tools
to fix index and database corruption. For a long-running postgres
database, you will occasionally have to drop indexes and re-create them,
causing downtime.

Dropping indexes and recreating them does not cause downtime. I've run a
couple of postgres-backed web sites for months on end with no issues. I've
survived a heavy slashdotting on my dual Pentium II-400, with Postgres
WRITES and READS on every Slashdot-referred hit, resulting in perfectly
respectable serving times (less than 3-4 seconds to serve > 20K of data on
each hit). No caching optimization of any kind on the app layer. And I'd
forgotten to vacuum my database for a few days.

Features:
Postgres is undoubtedly far, far more advanced than MySQL is. Postgres
now supports foreign keys, which can help with referential integrity.
Postgres supports subselects and better support for creating tables as
the result of queries. The "transaction" support that MySQL lacks is
included in Postgres, although you'll never miss it on a website, unless
you're building something for a bank, and if you're doing that, you'll
use oracle.

I'm just shocked at this. Where did this "transactions aren't necessary"
school of thinking originate? I've been developing database-backed web sites
for 5 years now, and I can't conceive of building a serious web site without
transactions. How do you guarantee that a record and its children records
are all stored together successfully? Do you run on a magic power grid that
never fails? Do you never have code-related error conditions that require
rolling back a series of database edits?

One quick point: while you may well be personally unbiased, VA Linux just
endorsed and funded MySQL. SourceForge uses MySQL. How do you expect to
convince readers that you're being objective in this comparison?

-Ben

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

Tim Perdue writes:

the 8K tuple limit.

BLCKSZ in src/include/config.h -- But it's being worked on these very
days.

Postgres never saturated all 4 processors as MySQL did.

Blame that on your operating system?

MySQL has some nice admin tools that allow you to watch individual
connections and queries as they progress

ps
tail -f <serverlog>

and tools to recover from corruption. I haven't seem any similar tools
for postgres.

I always like this one -- "tools to recover from corruption". If your
database is truly corrupted then there's nothing you can do about it, you
need a backup. If your database engine just creates garbage once in a
while then the solution is to fix the database engine, not to provide
external tools to clean up after it.

as there are no known tools to fix index

REINDEX

Both databases use a similar command-line interface. Postgres uses
"slash commands" to help you view database structures. MySQL uses a more
memorable, uniform syntax like "Show Tables; Show Databases; Describe
table_x;"

Yeah, but once you have memorized ours then it will be shorter to type. :)
And you get tab completion. And what's so non-uniform about ours?

The "transaction" support that MySQL lacks is included in Postgres,
although you'll never miss it on a website,

Think again. Transactions and multi-version concurrency control are
essential for any multi-user web site that expects any writes at all. I'll
reiterate the old Bugzilla bug: User A issues a search that "takes
forever". User B wants to update some information in the database, waits
for user A. Now *every* user in the system, reading or writing, is blocked
waiting for A (and B).

But you don't even have to go that far. What if you just update two
separate tables at once?

If your web site is truly read only, yes, you don't need transactions. But
then you don't need a database either. If your web site does writes, you
need transactions, or you're really not trying hard enough.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#7Tim Perdue
tperdue@valinux.com
In reply to: Benjamin Adida (#5)
Re: Article on MySQL vs. Postgres

Benjamin Adida wrote:

Jan said that each tool has its value, and that's true. I recommend you
define your evaluation context before you write this. Is this for running a
serious mission-critical web site? Is it for logging web site hits with
tolerance for data loss and a need for doing simple reporting?

This is for what most people do with PHP and databases - run
semi-critical medium-traffic sites. Anyone running a mission-critical
site would have to look elsewhere for true robustness. I would not at
this time recommend any serious, life-threatening app run On either
database.

Performace/Scalability:

MySQL was About 50-60% faster in real-world web serving, but it crumbles
under a real load. Postgres on the other hand scaled 3x higher than
MySQL before it started to crumble on the same machine. Unfortunately,
Postgres would probably still lose on a high-traffic website because
MySQL can crank out the pages so much faster, number of concurrent
connections is hard to compare. MySQL also seems to make better use of
multiple-processor machines like the quad-xeon I tested on. Postgres
never saturated all 4 processors as MySQL did.

What kind of queries did you perform?

I took a real-world page from our site
<http://sourceforge.net/forum/forum.php?forum_id=1&gt; and made it portable
to both databases. Of course, I could not import the "body" of the
message into postgres because of the 8k limitation, so the body had to
be dropped from both databases.

The "nested" view of this page requires joins against three tables and
some recursion to show submessages.

The test was conducted with "ab" (apache benchmark software) using
varying numbers of concurrent connections and 1000 total page views.

The "10% inserts" test is most realistic, as about 10% of all page views
in a discussion forum involve posting to the database. I used a
random-number generator in the PHP script to insert a row into the table
10% of the time. If you look at the results, you'll see that MySQL was
actually harmed somewhat more by the writes than postgres was.

Here are the actual results I saw on my quad-xeon machine:

postgres:

concurrency w/pconnects:
10 cli - 10.27 pg/sec 333.69 kb/s
20 cli - 10.24 pg/sec 332.86 kb/s
30 cli - 10.25 pg/sec 333.01 kb/s
40 cli - 10.0 pg/sec 324.78 kb/s
50 cli - 10.0 pg/sec 324.84 kb/s
75 cli - 9.58 pg/sec 311.43 kb/s
90 cli - 9.48 pg/sec 307.95 kb/s
100 cli - 9.23 pg/sec 300.00 kb/s
110 cli - 9.09 pg/sec 295.20 kb/s
120 cli - 9.28 pg/sec 295.02 kb/s (2.2% failure)

concurrency w/10% inserts & pconnects:
30 cli - 9.97 pg/sec 324.11 kb/s
40 cli - 10.08 pg/sec 327.40 kb/s
75 cli - 9.51 pg/sec 309.13 kb/s

MySQL:

Concurrency Tests w/pconnects:
30 cli - 16.03 pg/sec 521.01 kb/s
40 cli - 15.64 pg/sec 507.18 kb/s *failures
50 cli - 15.43 pg/sec 497.88 kb/s *failures
75 cli - 14.70 pg/sec 468.64 kb/s *failures
90 - mysql dies
110 - mysql dies
120 - mysql dies

Concurrency Tests w/o pconnects:
10 cli - 16.55 pg/sec 537.63 kb/s
20 cli - 15.99 pg/sec 519/51 kb/s
30 cli - 15.55 pg/sec 505.19 kb/s
40 cli - 15.46 pg/sec 490.01 kb/s 4.7% failure
50 cli - 15.59 pg/sec 482.24 kb/s 8.2% failure
75 cli - 17.65 pg/sec 452.08 kb/s 36.3% failure
90 cli - mysql dies

concurrency w/10% inserts & pconnects:
20 cli - 16.37 pg/sec 531.79 kb/s
30 cli - 16.15 pg/sec 524.64 kb/s
40 cli - 22.04 pg/sec 453.82 kb/sec 37.8% failure

Did you use connection pooling (a lot

I used persistent connections, yes. Without them, Postgres' showing was
far poorer, with mysql showing about 2x the performance.

of PHP apps don't, from what I've seen)? How does the performance get
affected when a query in Postgres with subselects has to be split into 4
different queries in MySQL?

I'd really love to see a case where a real-world page view requires 4x
the queries on MySQL. If you are doing subselects like that on a website
in real-time you've got serious design problems and postgres would
fold-up and quit under the load anyway.

Postgres is process-based, each connection
resulting in one process. If you use connection pooling with at least as
many connections as you have processors, you should see it scale quite well.
In fact, for serious load-testing, you should have 10-15 pooled connections.

I *strongly* question your intuition on Postgres running web sites. MySQL's

Specifically, what is the problem with my "intuition"? All I did in the
prior message was report my results and ask for feedback before I post
it.

write performance is very poor, which forces excessive caching (see sites
like Slashdot) to prevent updates from blocking entire web site serving.
Yes, the BDB addition might be useful. Let's see some performance tests
using BDB tables.

I wouldn't use BDB tables as MySQL 3.23.x isn't stable and I wouldn't
use it until it is.

Postgres is undoubtably the long-run winner in stability, whereas MySQL
will freak out or die when left running for more than a month at a time.
But if you ever do have a problem with postgres, you generally have to
nuke the database and recover from a backup, as there are no known tools
to fix index and database corruption. For a long-running postgres
database, you will occasionally have to drop indexes and re-create them,
causing downtime.

Dropping indexes and recreating them does not cause downtime. I've run a
couple of postgres-backed web sites for months on end with no issues. I've
survived a heavy slashdotting on my dual Pentium II-400, with Postgres
WRITES and READS on every Slashdot-referred hit, resulting in perfectly
respectable serving times (less than 3-4 seconds to serve > 20K of data on
each hit). No caching optimization of any kind on the app layer. And I'd
forgotten to vacuum my database for a few days.

Not sure why you're arguing with this as this was a clear win for
postgres.

Do you run on a magic power grid that
never fails?

Reality is that postgres is as likely - or more likely - to wind up with
corrupted data than MySQL. I'm talking physical corruption where I have
to destroy the database and recover from a dump. Just a couple months
ago I sent a message about "Eternal Vacuuming", in which case I had to
destroy and recover a multi-gigabyte database.

Further, I have had situations where postgres actually had DUPLICATE ids
in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That? Well,
I had to run a count(*) next to each ID and select out the rows where
there was more than one of each "unique" id, then reinsert those rows
and drop and rebuild the indexes and reset the sequences.

I've only been using MySQL for about a year (as compared to 2 years for
postgres), but I have never seen either of those problems with MySQL.

Do you never have code-related error conditions that require
rolling back a series of database edits?

Personally, I check every query in my PHP code. On the rare occasion
that it fales, I show an error and get out. Even with postgres, I have
always checked success or failure of a query and shown an appropriate
error. Never in two years of programming PHP/postgres have I ever used
commit/rollback, and I have written some extremely complex web apps
(sourceforge being a prime example). Geocrawler.com runs on postgres and
again, I NEVER saw any need for any kind of rollback at all.

The statelessness of the web pretty much obviates the needs for
locks/rollbacks as each process is extremely quick and runs from start
to finish instantly. It's not like the old days where you pull data down
into a local application, work on it, then upload it again.

Only now, with some extremely complex stuff that we're doing on
SourceForge would I like to see locks and rollbacks (hence my recent
interest in benchmarking and comparing the two). Your average web
programmer will almost never run into that in the short term.

One quick point: while you may well be personally unbiased, VA Linux just
endorsed and funded MySQL. SourceForge uses MySQL. How do you expect to
convince readers that you're being objective in this comparison?

Your own strong biases are shown in your message. I do this stuff
because I'm curious and want to find out for myself. Most readers will
find it interesting as I did. Few will switch from MySQL to postgres or
vice versa because of it.

Another clarification: PHPBuilder is owned by internet.com, a competitor
of VA Linux/Andover.

Tim

--
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723

#8Mike Mascari
mascarm@mascari.com
In reply to: Benjamin Adida (#5)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

This is for what most people do with PHP and databases - run
semi-critical medium-traffic sites. Anyone running a mission-critical
site would have to look elsewhere for true robustness. I would not at
this time recommend any serious, life-threatening app run On either
database.

I've seen problems with block read errors in large Oracle
databases which fail their alleged CRC check -- intermittent core
dumps which required a dump/restore of 25 years of insurance
claims data (40 gig - it was a lot at the time). After being down
for days and restoring on a new box, the same errors occured.

I'd really love to see a case where a real-world page view requires 4x
the queries on MySQL. If you are doing subselects like that on a website
in real-time you've got serious design problems and postgres would
fold-up and quit under the load anyway.

This can be true for Internet sites, of course. But with
corporate Intranet sites that dish-out and process ERP data, the
queries can become quite complex while concurrency is limited to
< 1000 users.

Further, I have had situations where postgres actually had DUPLICATE ids
in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That? Well,
I had to run a count(*) next to each ID and select out the rows where
there was more than one of each "unique" id, then reinsert those rows
and drop and rebuild the indexes and reset the sequences.

Umm...

DELETE FROM foo WHERE EXISTS
(SELECT f.key FROM foo f WHERE f.key = foo.key AND f.oid >
foo.oid);

I believe there's even a purely SQL (non-oid) method of doing
this as well.

Personally, I check every query in my PHP code. On the rare occasion
that it fales, I show an error and get out. Even with postgres, I have
always checked success or failure of a query and shown an appropriate
error. Never in two years of programming PHP/postgres have I ever used
commit/rollback, and I have written some extremely complex web apps
(sourceforge being a prime example). Geocrawler.com runs on postgres and
again, I NEVER saw any need for any kind of rollback at all.

This is the nature of the application. In the same example above,
how can I "charge" a cost center for the purchase of products in
an in-house distribution center and "deduct" the resulting
quantity from the distribution center's on-hand inventory sanely
without transactions?

Mike Mascari

#9Chris Bitmead
chrisb@nimrod.itg.telstra.com.au
In reply to: Benjamin Adida (#5)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

I'd really love to see a case where a real-world page view requires 4x
the queries on MySQL. If you are doing subselects like that on a website
in real-time you've got serious design problems and postgres would
fold-up and quit under the load anyway.

Why? There are some subselect queries that have no problems running in
real-time. There are some non-subselect queries which one should never
attempt in real time. There is nothing fundamentally wrong with using
subselects for page views if it works for you. Nor is there anything
necessarily wrong with a design that requires subselects.

Do you run on a magic power grid that
never fails?

Reality is that postgres is as likely - or more likely - to wind up with
corrupted data than MySQL.

What do you base this statement on? With your sample size of one
corrupted postgres database? Also do you include inconsistent data in
your definition of corrupted data?

Never in two years of programming PHP/postgres have I ever used
commit/rollback, and I have written some extremely complex web apps
(sourceforge being a prime example).

I would humbly suggest that you are doing it wrong then.

Geocrawler.com runs on postgres and
again, I NEVER saw any need for any kind of rollback at all.

So what do you do when you get an error and "get out" as you put it?
Leave the half-done work in the database?

The statelessness of the web pretty much obviates the needs for
locks/rollbacks as each process is extremely quick and runs from start
to finish instantly. It's not like the old days where you pull data down
into a local application, work on it, then upload it again.

Even in the "old days" you should never keep a transaction open while
you "work on it". Transactions should *always* be short, and the web
changes nothing.

Really, REALLY there is nothing different about the web to traditional
applications as far as the db is concerned.

#10The Hermit Hacker
scrappy@hub.org
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

On Tue, 4 Jul 2000, Tim Perdue wrote:

Performace/Scalability:

MySQL was About 50-60% faster in real-world web serving, but it
crumbles under a real load. Postgres on the other hand scaled 3x
higher than MySQL before it started to crumble on the same machine.
Unfortunately, Postgres would probably still lose on a high-traffic
website because MySQL can crank out the pages so much faster

Actually, this one depends alot on how the site is
setup/programmed. I did work with a friend several months ago using the
newest released versions of MySQL and PostgreSQL ... we loaded (with some
massaging) the exact same data/tables onto both on the *exact* same
machine, and the exact same operating system. When we ran their existing
web site, without modifications, on both MySQL and PgSQL, the MySQL was
substantially faster ... when we spent a little bit of time looking at the
queries used, we found that due to MySQLs lack of sub-queries, each page
being loaded had to do multiple queries to get the same information that
we could get out of PgSQL using one. Once we optimized the queries, our
timings to load the page went from something like 3sec for MySQL and 1sec
for PgSQL ... (vs something like, if I recall correctly, 19sec for
PgSQL) ...

Same with some recent work I did with UDMSearch ... by default,
UDMSearch does 2+n queries to the database to get the information it
requires ... by re-writing the 'n' queries that are performed as an IN
query, I was able to cut down searches from taking ~1sec*n queries down to
a 3sec query ...

The point being that if you do a 1:1 comparison, MySQL will be
faster ... if you use features in PgSQL that don't exist in MySQL, you can
knock that speed difference down considerably, if not surpass MySQL,
depending on the circumstance ...

#11The Hermit Hacker
scrappy@hub.org
In reply to: 'JanWieck@t-online.de' (#4)
Re: Article on MySQL vs. Postgres

Final notes:

I hate these "MySQL" vs. "PostgreSQL" articles that want to
say "this one is the better". Each one has it's advantages
and disadvantages. Both have a long TODO.

Also, none of the 'comparisons' take the time to deal with the fact that
ones "disadvantages" can generally be overcome using its
"advantages" (ie. speed issues with PostgreSQL can generally be overcome
by making use of its high end features (ie. subselects)) ...

#12The Hermit Hacker
scrappy@hub.org
In reply to: Benjamin Adida (#5)
Re: Article on MySQL vs. Postgres

On Tue, 4 Jul 2000, Benjamin Adida wrote:

Dropping indexes and recreating them does not cause downtime. I've run a

Just got hit with a 'bits moved;recreate index' on the PostgreSQL search
engine ... drop'd and re-created index on the fly, no server shut down ...

couple of postgres-backed web sites for months on end with no issues. I've
survived a heavy slashdotting on my dual Pentium II-400, with Postgres
WRITES and READS on every Slashdot-referred hit, resulting in perfectly
respectable serving times (less than 3-4 seconds to serve > 20K of data on
each hit). No caching optimization of any kind on the app layer. And I'd
forgotten to vacuum my database for a few days.

We had a *very* old version of PostgreSQL running on a Pentium acting as
an accounting/authentication backend to a RADIUS server for an ISP
... uptime for the server itself was *almost* 365 days (someone hit the
power switch by accident, meaning to power down a different machine
*sigh*) ... PostgreSQL server had been up for something like 6 months
without any problems, with the previous downtime being to upgrade the
server ...

Features:

Postgres is undoubtedly far, far more advanced than MySQL is. Postgres
now supports foreign keys, which can help with referential integrity.
Postgres supports subselects and better support for creating tables as
the result of queries. The "transaction" support that MySQL lacks is
included in Postgres, although you'll never miss it on a website, unless
you're building something for a bank, and if you're doing that, you'll
use oracle.

I'm just shocked at this. Where did this "transactions aren't necessary"
school of thinking originate?

Ummm, hate to disparage someone else, and I may actually be incorrect, but
I'm *almost* certain that MySQL docs, at one time, had this in it
... where they were explaining why they didn't have and never would have
transaction support. Obviously this mentality has changed since, with
the recent addition of transactions through a third-party database product
(re: Berkeley DB) ...

I've been developing database-backed web sites for 5 years now, and I
can't conceive of building a serious web site without transactions.
How do you guarantee that a record and its children records are all
stored together successfully? Do you run on a magic power grid that
never fails? Do you never have code-related error conditions that
require rolling back a series of database edits?

Actually, hate to admit it, but it wasn't until recently that I clued into
what transaction were for and how they wre used :( I now use them for
just about everything I do, and couldn't imagine doing without them ...

#13The Hermit Hacker
scrappy@hub.org
In reply to: Peter Eisentraut (#6)
Re: Article on MySQL vs. Postgres

On Wed, 5 Jul 2000, Peter Eisentraut wrote:

If your web site is truly read only, yes, you don't need transactions. But
then you don't need a database either. If your web site does writes, you
need transactions, or you're really not trying hard enough.

... or not popular enough :)

#14The Hermit Hacker
scrappy@hub.org
In reply to: Tim Perdue (#7)
Re: Article on MySQL vs. Postgres

On Tue, 4 Jul 2000, Tim Perdue wrote:

Benjamin Adida wrote:

Jan said that each tool has its value, and that's true. I recommend you
define your evaluation context before you write this. Is this for running a
serious mission-critical web site? Is it for logging web site hits with
tolerance for data loss and a need for doing simple reporting?

This is for what most people do with PHP and databases - run
semi-critical medium-traffic sites. Anyone running a mission-critical
site would have to look elsewhere for true robustness. I would not at
this time recommend any serious, life-threatening app run On either
database.

Someone want to give me an example of something that would be
life-threatening that would run on a database? I can think of loads of
mission critical stuff, but life threatening? As for mission critical,
mission critical is in the eye of the end-user ... all my clients run
PostgreSQL for their backend needs, and I can guarantee you that each and
every one of them considers it a mission critical element to their sites
... then again, I have 3+ years of personal experience with PostgreSQL to
back me up ..

I took a real-world page from our site
<http://sourceforge.net/forum/forum.php?forum_id=1&gt; and made it portable
to both databases. Of course, I could not import the "body" of the

did you take the time to optimize the queries to take advantage of
features that MySQL doesn't have, or just straight plug-n-play?

of PHP apps don't, from what I've seen)? How does the performance get
affected when a query in Postgres with subselects has to be split into 4
different queries in MySQL?

I'd really love to see a case where a real-world page view requires 4x
the queries on MySQL. If you are doing subselects like that on a website
in real-time you've got serious design problems and postgres would
fold-up and quit under the load anyway.

Odd, I'll have to let one of my clients know that their site has design
flaws ... wait, no, they had 3x the queries in MySQL as in PgSQL, so that
probably doesnt' apply ...

Do you run on a magic power grid that
never fails?

Reality is that postgres is as likely - or more likely - to wind up with
corrupted data than MySQL. I'm talking physical corruption where I have
to destroy the database and recover from a dump.

Odd, in my 3+ years of PostgreSQL development, I've yet to have a
project/database corrupt such that I had to restore from backups *knock on
wood* INDEX corruption, yup ... 'DROP INDEX/CREATE INDEX' fixes that
though. Physical database corruption, nope ...

Further, I have had situations where postgres actually had DUPLICATE
ids in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That?
Well, I had to run a count(*) next to each ID and select out the rows
where there was more than one of each "unique" id, then reinsert those
rows and drop and rebuild the indexes and reset the sequences.

Odd, were you using transactions here, or transactionless?

Do you never have code-related error conditions that require
rolling back a series of database edits?

Personally, I check every query in my PHP code. On the rare occasion
that it fales, I show an error and get out. Even with postgres, I have
always checked success or failure of a query and shown an appropriate
error. Never in two years of programming PHP/postgres have I ever used
commit/rollback, and I have written some extremely complex web apps
(sourceforge being a prime example). Geocrawler.com runs on postgres and
again, I NEVER saw any need for any kind of rollback at all.

Wait ... how does checking every query help if QUERY2 fails after QUERY1
is sent, and you aren't using transactions?

Only now, with some extremely complex stuff that we're doing on
SourceForge would I like to see locks and rollbacks (hence my recent
interest in benchmarking and comparing the two). Your average web
programmer will almost never run into that in the short term.

Cool, at least I'm not considered average :) I *always* use transactions
in my scripts ... *shrug* then again, I'm heavily into 'the rules of
normalization', so tend to not crowd everything into one table.

#15Tim Perdue
tperdue@valinux.com
In reply to: The Hermit Hacker (#14)
Re: Article on MySQL vs. Postgres

The Hermit Hacker wrote:

Further, I have had situations where postgres actually had DUPLICATE
ids in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That?
Well, I had to run a count(*) next to each ID and select out the rows
where there was more than one of each "unique" id, then reinsert those
rows and drop and rebuild the indexes and reset the sequences.

Odd, were you using transactions here, or transactionless?

Does it matter? I suppose it was my programming error that somehow I got
duplicate primary keys in a table in the database where that should be
totally impossible under any circumstance? Another stupid
transactionless program I'm sure.

At any rate, it appears that the main problem I had with postgres (the
8K tuple limit) is being fixed and I will mention that in my writeup.

Tim

--
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723

#16Michael Mayo
michael-a-mayo@worldnet.att.net
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

----- Original Message -----
From: "Tim Perdue" <tperdue@valinux.com>

Before I do that I want to confirm the major problem I had w/postgres:
the 8K tuple limit.

Just wanted to point out that this is not *exactly* true. While the
default limit is 8k, all that is required to change it to 32k is to change
one line of text in config.h (blcksz from 8k to 32k). This is pointed out
in the FAQ. So I would really consider the *default* to be 8k and the
*limit* to be 32k. IMHO 32k is good enough for 99% of tuples in a typical
bulletin-board-like application. It is not unreasonable to reject posts >
32k in size. Though you might want to evaluate performance using the 32k
tuples; might increase or decrease depending on application.

-Mike

In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

MySQL was About 50-60% faster in real-world web serving ...

Sorry if I didn't noticed, but I searched all the messages in the thread
for an information about the PostgreSQL version used in the test and
didn't found anything.

Tim, what version of PostgreSQL did you used? Hope it's 7.x.

Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

#18Tim Perdue
tperdue@valinux.com
In reply to: Tim Perdue (#1)
Re: Article on MySQL vs. Postgres

Constantin Teodorescu wrote:

Tim, what version of PostgreSQL did you used? Hope it's 7.x.

Yes, 7.0.2

Tim

--
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723

#19Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#14)
Re: Article on MySQL vs. Postgres

The Hermit Hacker wrote:

On Tue, 4 Jul 2000, Tim Perdue wrote:

Further, I have had situations where postgres actually had DUPLICATE
ids in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That?
Well, I had to run a count(*) next to each ID and select out the rows
where there was more than one of each "unique" id, then reinsert those
rows and drop and rebuild the indexes and reset the sequences.

Odd, were you using transactions here, or transactionless?

Actully I think I remember a recent bug report about some condition that
failed the uniqueness check when inside a transaction ;(

I think the report came with a fix ;)

------------
Hannu

#20Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#14)
Re: Article on MySQL vs. Postgres

Tim Perdue wrote:

The Hermit Hacker wrote:

Further, I have had situations where postgres actually had DUPLICATE
ids in a primary key field, probably due to some abort or other nasty
situation in the middle of a commit. How did I recover from That?
Well, I had to run a count(*) next to each ID and select out the rows
where there was more than one of each "unique" id, then reinsert those
rows and drop and rebuild the indexes and reset the sequences.

Odd, were you using transactions here, or transactionless?

Does it matter? I suppose it was my programming error that somehow I got
duplicate primary keys in a table in the database where that should be
totally impossible under any circumstance? Another stupid
transactionless program I'm sure.

At any rate, it appears that the main problem I had with postgres (the
8K tuple limit) is being fixed and I will mention that in my writeup.

Currently (as of 7.0.x) you could use BLKSIZE=32K + lztext datatype and
get text fields about 64-128K depending on data if you are desperately
after big textfields.

-----------
Hannu

#21'JanWieck@t-online.de'
JanWieck@t-online.de
In reply to: The Hermit Hacker (#14)
#22Ron Chmara
ron@Opus1.COM
In reply to: The Hermit Hacker (#14)
#23Giles Lean
giles@nemeton.com.au
In reply to: The Hermit Hacker (#14)
#24Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#14)
#25Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#14)
#26The Hermit Hacker
scrappy@hub.org
In reply to: 'JanWieck@t-online.de' (#21)
#27Robert B. Easter
reaster@comptechnews.com
In reply to: Hannu Krosing (#20)
#28Peter Galbavy
Peter.Galbavy@knowledge.com
In reply to: Tim Perdue (#1)
#29Mike Mascari
mascarm@mascari.com
In reply to: Tim Perdue (#1)
#30Robert B. Easter
reaster@comptechnews.com
In reply to: Hannu Krosing (#25)
In reply to: Tim Perdue (#7)
#32Hannu Krosing
hannu@tm.ee
In reply to: Benjamin Adida (#31)
#33Tim Perdue
tperdue@valinux.com
In reply to: Benjamin Adida (#31)
#34Peter Galbavy
Peter.Galbavy@knowledge.com
In reply to: Benjamin Adida (#31)
In reply to: Tim Perdue (#33)
#36Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Benjamin Adida (#31)
In reply to: Benjamin Adida (#31)
#38Sevo Stille
sevo@ip23.net
In reply to: The Hermit Hacker (#12)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#24)
#40The Hermit Hacker
scrappy@hub.org
In reply to: Benjamin Adida (#31)
#41Tim Perdue
tperdue@valinux.com
In reply to: Benjamin Adida (#31)
#42The Hermit Hacker
scrappy@hub.org
In reply to: Tim Perdue (#33)
#43The Hermit Hacker
scrappy@hub.org
In reply to: Tim Perdue (#41)
In reply to: Tim Perdue (#41)
#45Tim Perdue
tperdue@valinux.com
In reply to: Benjamin Adida (#44)
#46The Hermit Hacker
scrappy@hub.org
In reply to: Tim Perdue (#45)
#47'JanWieck@t-online.de'
JanWieck@t-online.de
In reply to: The Hermit Hacker (#42)
#48Chris Bitmead
chrisb@nimrod.itg.telstra.com.au
In reply to: Benjamin Adida (#31)
#49Bruce Momjian
bruce@momjian.us
In reply to: Tim Perdue (#45)
#50The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#49)