Why facebook used mysql ?
There was an interesting post today on highscalability -
http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
The discussion/comments touched upon why mysql is a better idea for Facebook
than Postgres. Here's an interesting one
One is that PG doesn't scale that well on multiple cores as MySQL nowadays.
Another is in fundamental differences of storage architecture - all
MySQL/InnoDB data is either a clustered primary key, or secondary key with
PK pointers - logical relationships between entries allow to have index-only
scans, which are a must for web-facing databases (good response times, no
variance).
One more reason is that in heavily indexed databases vacuum will have to do
full index passes, rather than working with LRU.
As for sharding, etc - there's no way to scale vertically infinitely - so
the "stupid people shard" point is very very moot.
It is much cheaper to go the commodity hardware path.
or
In general Postgresql is faster at complex queries with a lot of joins and
such, while MySQL is faster at simple queries such as primary key look up.
I wonder if anyone can comment on this - especially the part that PG doesnt
scale as well as MySQL on multiple cores ?
regards
Sandeep
On Mon, Nov 8, 2010 at 8:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
I wonder if anyone can comment on this - especially the part that PG doesnt
scale as well as MySQL on multiple cores ?
Sorry Sandeep, there may be some that love to re-re-re-hash these
these subjects. I myself am losing interest.
The following link contains hundreds of comments that you may be
interested in, some that address issues that are much more interesting
and well established:
http://search.postgresql.org/search?q=mysql+performance&m=1&l=NULL&d=365&s=r&p=1
--
Regards,
Richard Broersma Jr.
Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug
On Tue, Nov 9, 2010 at 10:31 AM, Richard Broersma <
richard.broersma@gmail.com> wrote:
The following link contains hundreds of comments that you may be
interested in, some that address issues that are much more interesting
and well established:http://search.postgresql.org/search?q=mysql+performance&m=1&l=NULL&d=365&s=r&p=1
I did actually try to search for topics on multiple cores vs MySQL, but I
wasnt able to find anything of much use. Elsewhere (on Hacker News for
example), I have indeed come across statements that PG scales better on
multiple cores, which are usually offset by claims that MySQL is better.
Google isnt of much use for this either - while MySQL has several resources
talking about benchmarks/tuning on multi core servers (e.g.
http://dimitrik.free.fr/blog/archives/2010/09/mysql-performance-55-notes.html),
I cant find any such serious discussion on Postgresql
However, what I did find (
http://www.pgcon.org/2008/schedule/events/72.en.html) was titled "*Problems
with PostgreSQL on Multi-core Systems with Multi-Terabyte Data*"
(interestingly, published by the Postgresql Performance Team @ Sun)
Ergo, my question still stands - maybe my google-fu was bad... why is why I
am asking for help.
regards
Sandeep
On Mon, Nov 8, 2010 at 10:47 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
I did actually try to search for topics on multiple cores vs MySQL, but I
wasnt able to find anything of much use. Elsewhere (on Hacker News for
example), I have indeed come across statements that PG scales better on
multiple cores, which are usually offset by claims that MySQL is better.
Google isnt of much use for this either - while MySQL has several resources
talking about benchmarks/tuning on multi core servers
(e.g. http://dimitrik.free.fr/blog/archives/2010/09/mysql-performance-55-notes.html),
I cant find any such serious discussion on Postgresql
Part of that is that 48 core machines with fast enough memory busses
to use those cores, are only now coming out in affordable packages
($10k or so for a machine with a handful of drives) that they're just
getting tested. I have 8 core, and 12 core older gen AMDs with DDR667
and DDR800 memory, and they dont' scale PAST 8 cores, either one, but
that limitation is due more to the slower HT buss on the older AMDs.
With the much faster HT busses on the 6xxx series Magny Cours CPUs
they scale right out to 40+ cores or so, and give great numbers. The
taper as you go past 48 processes isn't to bad. With proper pooling
to keep the number of active connections at or below say 50, it should
run well for a pretty huge load. And in everyday operation they are
always responsive, even when things aren't going quite right
otherwise.
However, what I did find
(http://www.pgcon.org/2008/schedule/events/72.en.html) was titled "Problems
with PostgreSQL on Multi-core Systems with Multi-Terabyte Data"
(interestingly, published by the Postgresql Performance Team @ Sun)
We're not a company selling a product, we're enthusiasts racing our
databases on the weekends, so to speak, and if someone has ideas on
what's slow and how to make it faster we talk about it. :) That
paper wasn't saying that postgresql is problematic at large levels so
much as to address the problems that arise when you do, and ways to
look forward to improving performance.
Ergo, my question still stands - maybe my google-fu was bad... why is why I
am asking for help.
To know if either is a good choice you really need to say what you're
planning on doing. If you're building a petabyte sized datawarehouse
look at what yahoo did with a custom hacked version of pgsql. If
you're gonna build another facebook look at what they did. They're
both very different applications of a "database".
So, your question needs more substance. What do you want to do with your db?
--
To understand recursion, one must first understand recursion.
On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
There was an interesting post today on highscalability
- http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
The discussion/comments touched upon why mysql is a better idea for Facebook
than Postgres. Here's an interesting one
postgresql might not be a good fit for this type of application, but
the reasoning given in the article is really suspicious. The true
answer was hinted at in the comments: "we chose it first, and there
was never a reason to change it". It really comes down to they
probably don't need much from the database other than a distributed
key value store, and they built a big software layer on top of that to
manage it. Hm, I use facebook and I've seen tons of inconsistent
answers, missing notifications and such. I wonder if there's a
connection there...
merlin
On Tue, Nov 9, 2010 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Nov 8, 2010 at 11:24 PM, Sandeep Srinivasa <sss@clearsenses.com> wrote:
There was an interesting post today on highscalability
- http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
The discussion/comments touched upon why mysql is a better idea for Facebook
than Postgres. Here's an interesting onepostgresql might not be a good fit for this type of application, but
the reasoning given in the article is really suspicious. The true
answer was hinted at in the comments: "we chose it first, and there
was never a reason to change it". It really comes down to they
probably don't need much from the database other than a distributed
key value store, and they built a big software layer on top of that to
manage it. Hm, I use facebook and I've seen tons of inconsistent
answers, missing notifications and such. I wonder if there's a
connection there...merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I agree with Merlin, There is a surprising big number of "good"
technology companies (including Google) out there using MySQL. For
sometime I have been wondering why and have come up with a few
(possibly wrong) theories. Such as: these companies are started by
application developers not database experts, the cost (effort) of
changing to other database engine is substantial given that that
probably there is already so much inconsistencies in their current
data setup coupled with considerable amount of inconsistency cover-up
code at the application programs, and maybe the IT team is doubling up
as a fire fighting department constantly putting out the data driven
fires. This is then compounded by the rapid increase in data.
Allan.
On Nov 9, 2010, at 7:04 AM, Allan Kamau wrote:
have come up with a few
(possibly wrong) theories.
They all sound reasonable. I think you missed an important one though: aggressive (and even sometimes outright false) promotion and sales by the company MySQL AB.
Why I started looking at databases, you didn't have to look very hard to find PostgreSQL, but you did have to at least make a minimal effort.
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)
In those same times, mysql was also, um, other than rock solid. I
have somewhere a personal email from Monty describing how to
crash-recover corrupted myisam data files (I was customer number 13 I
believe... i wish i still had that support contract certificate as an
artifact)
Vick Khera <vivek@khera.org> writes:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)
In those same times, mysql was also, um, other than rock solid.
I don't have enough operational experience with mysql to speak to how
reliable it was back in the day. What it *did* have over postgres back
then was speed. It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks. (mysql still beats us on
cases like that, though not by as much.) I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.
regards, tom lane
Hey all,
IMO that they choiced MySQL because of no knowledge
about PostgreSQL and about valid database designs.
Just garbage of data for SELECTing with minimal efforts
on data integrity and database server programming (ala
typical PHP project).
Sorry :-)
2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>
Vick Khera <vivek@khera.org> writes:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com>
wrote:
Also, my understanding is that if you go way back on the PostgreSQL
timeline to versions 6 and earliest 7.x, it was a little shaky. (I started
with 7.3 or 7.4, and it has been rock solid.)In those same times, mysql was also, um, other than rock solid.
I don't have enough operational experience with mysql to speak to how
reliable it was back in the day. What it *did* have over postgres back
then was speed. It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks. (mysql still beats us on
cases like that, though not by as much.) I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
// Dmitriy.
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf > Of Tom Lane
Sent: Tuesday, November 09, 2010 10:55 AM
To: Vick Khera
Cc: Scott Ribe; Allan Kamau; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why facebook used mysql ?Vick Khera <vivek@khera.org> writes:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 > and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock > > > solid.)
In those same times, mysql was also, um, other than rock solid.
I don't have enough operational experience with mysql to speak to how
reliable it was back in the day. What it *did* have over postgres back
then was speed. It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks. (mysql still beats us on
cases like that, though not by as much.) I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.
A different slant on this has to do with licensing and $$. Might Oracle decide some day to start charging for their new found DB? They are a for-profit company that's beholding to their shareholders LONG before an open software community. Consumers like Facebook and Google have deep pockets, something corporate executives really don't dismiss lightly.
Also there's the strange and mysterious valley group-think syndrome.
I've seen this with several products/technologies over the years.
I suspect it comes from the VCs, but I'm not sure. The latest example
is "you should be using EC2". There always follows a discussion where
I can present 50 concrete reasons based on hard experience why
the suggestion is a bad idea and the other person presents nothing
besides "everyone's doing it". I saw exactly the same thing with MySQL
a few years ago. Before that it was Oracle. It's often easier to go along
with the flow and get some work done vs. trying to argue.
kamauallan@gmail.com (Allan Kamau) writes:
I agree with Merlin, There is a surprising big number of "good"
technology companies (including Google) out there using MySQL. For
sometime I have been wondering why and have come up with a few
(possibly wrong) theories. Such as: these companies are started by
application developers not database experts, the cost (effort) of
changing to other database engine is substantial given that that
probably there is already so much inconsistencies in their current
data setup coupled with considerable amount of inconsistency cover-up
code at the application programs, and maybe the IT team is doubling up
as a fire fighting department constantly putting out the data driven
fires. This is then compounded by the rapid increase in data.
This wasn't a good explanation for what happened when Sabre announced
they were using MySQL:
http://www.mysql.com/news-and-events/generate-article.php?id=2003_33
I used to work at Sabre, and what I saw was *mostly* an Oracle shop, but
with significant bastions of IMS, DB2, Teradata, and Informix. Your
theory might fit with "dumb startups," but certainly not with Sabre,
which still has significant deployments of IMS! :-)
I actually am inclined to go with "less rational" explanations; a lot of
decisions get made for reasons that do not connect materially (if at
all) with the technical issues.
One such would be that the lawyers and marketing folk that tend to be at
the executive layer do *their* thing of making deals, and when they're
busy "making deals," the only people interfacing with them are:
- Salescritters from the Big O buying them lunch
- Other Political Animals that Made The Decision to go with MySQL (or
such) and are happy to explain, over golf, that "it went fine for us"
(even if it didn't go entirely so fine; they didn't hear about it)
Lunch and golf can have material effects.
--
"cbbrowne","@","acm.org"
Rules of the Evil Overlord #67. "No matter how many shorts we have in
the system, my guards will be instructed to treat every surveillance
camera malfunction as a full-scale emergency."
2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
Vick Khera <vivek@khera.org> writes:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)
In those same times, mysql was also, um, other than rock solid.
I don't have enough operational experience with mysql to speak to how
reliable it was back in the day. What it *did* have over postgres back
then was speed. It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks. (mysql still beats us on
cases like that, though not by as much.) I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.
Facebook have writen "Flashcache [is] built primarily as a block
cache for InnoDB but is general purpose and can be used by other
applications as well."
https://github.com/facebook/flashcache/
A good tool by the way. It is the only place where I like to see SSD
disk. (not at facebook, but with 'volatile' data)
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Tue, Nov 9, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vick Khera <vivek@khera.org> writes:
On Tue, Nov 9, 2010 at 10:26 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
Also, my understanding is that if you go way back on the PostgreSQL timeline to versions 6 and earliest 7.x, it was a little shaky. (I started with 7.3 or 7.4, and it has been rock solid.)
In those same times, mysql was also, um, other than rock solid.
I don't have enough operational experience with mysql to speak to how
reliable it was back in the day. What it *did* have over postgres back
then was speed. It was a whole lot faster, particularly on the sort of
single-stream-of-simple-queries cases that people who don't know
databases are likely to set up as benchmarks. (mysql still beats us on
cases like that, though not by as much.) I think that drove quite a
few early adoption decisions, and now folks are locked in; the cost of
conversion outweighs the (perceived) benefits.
Postgres 7.2 brought non blocking vacuum. Before that, you could
pretty much write off any 24x7 duty applications -- dealing with dead
tuples was just too much of a headache. The mysql of the time, 3.23,
was fast but locky and utterly unsafe. It has been easier to run
though until recently (8.4 really changed things).
Postgres has been relatively disadvantaged in terms of administrative
overhead which is a bigger deal than sql features, replication,
performance, etc for high load website type cases. heap FSM, tunable
autovacuum, checkpoint management, smarter/faster statistics
collector, and more backup options may not be as sexy as replication
etc but are very appealing features if you are running 50 database
servers backing a monster web site. Dumping sys v ipc for mmap is a
hypothetical improvement in that vein :-) (aiui, it is not possible
though).
merlin
--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:
A different slant on this has to do with licensing and $$.
Might Oracle decide some day to start charging for their new
found DB? They are a for-profit company that's
beholding to their shareholders LONG before an open software
community. Consumers like Facebook and Google have
deep pockets, something corporate executives really don't
dismiss lightly.
This is just FUD.
MySQL is GPL'd, just like Linux is.
To say you should avoid MySQL because Oracle may someday start charging for it is like saying you should avoid Linux because Red Hat may someday start charging for it.
That makes no sense, especially since both Oracle and Red Hat are already charging for their products. Doesn't mean you can't keep using free Linux and MySQL.
Think upgrades
-----Original Message-----
From: Andy [mailto:angelflow@yahoo.com]
Sent: Tuesday, November 09, 2010 12:02 PM
To: pgsql-general@postgresql.org; Gauthier, Dave
Subject: Re: [GENERAL] Why facebook used mysql ?
--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:
A different slant on this has to do with licensing and $$.
Might Oracle decide some day to start charging for their new
found DB? They are a for-profit company that's
beholding to their shareholders LONG before an open software
community. Consumers like Facebook and Google have
deep pockets, something corporate executives really don't
dismiss lightly.
This is just FUD.
MySQL is GPL'd, just like Linux is.
To say you should avoid MySQL because Oracle may someday start charging for it is like saying you should avoid Linux because Red Hat may someday start charging for it.
That makes no sense, especially since both Oracle and Red Hat are already charging for their products. Doesn't mean you can't keep using free Linux and MySQL.
On Tue, Nov 9, 2010 at 10:00 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
Postgres 7.2 brought non blocking vacuum. Before that, you could
pretty much write off any 24x7 duty applications -- dealing with dead
tuples was just too much of a headache.
Amen! I remember watching vacuum run alongside other queries and
getting all school-girl giggly over it. Seriously it was a big big
change for pgsql.
The mysql of the time, 3.23,
was fast but locky and utterly unsafe.
True, it was common to see mysql back then just stop, dead. Go to
bring it back up and have to repair tables.
Postgres has been relatively disadvantaged in terms of administrative
overhead which is a bigger deal than sql features, replication,
performance, etc for high load website type cases.
I would say it's a bigger problem for adoption than for high load
sites. If Joe User spends an hour a day keeping his database on his
workstation happy, he's probably not happy. If Joe Admin spends an
hour a day keeping his 100 machine db farm happy, he's probably REALLY
happy that it only takes so long.
On 09 Nov 2010, at 7:16 PM, Gauthier, Dave wrote:
Think upgrades
This is covered by the GPL license. Once you have released code under
the GPL, all derivative code - ie upgrades - have to also be released
in source form, under the GPL license.
Regards,
Graham
--
Any upgrades that are based on the MySQL source code will be legally required to be released under GPL too.
That's the beauty of GPL.
Software under MIT or BSD license could be hijacked by private companies. Software under GPL license could not.
--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:
Show quoted text
From: Gauthier, Dave <dave.gauthier@intel.com>
Subject: RE: [GENERAL] Why facebook used mysql ?
To: "Andy" <angelflow@yahoo.com>, "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Date: Tuesday, November 9, 2010, 12:16 PM
Think upgrades-----Original Message-----
From: Andy [mailto:angelflow@yahoo.com]Sent: Tuesday, November 09, 2010 12:02 PM
To: pgsql-general@postgresql.org;
Gauthier, Dave
Subject: Re: [GENERAL] Why facebook used mysql ?--- On Tue, 11/9/10, Gauthier, Dave <dave.gauthier@intel.com> wrote:A different slant on this has to do with licensing and
$$.
Might Oracle decide some day to start charging for
their new
found DB? They are a for-profit company that's
beholding to their shareholders LONG before an opensoftware
community. Consumers like Facebook and Google have
deep pockets, something corporate executives reallydon't
dismiss lightly.
This is just FUD.
MySQL is GPL'd, just like Linux is.
To say you should avoid MySQL because Oracle may someday
start charging for it is like saying you should avoid Linux
because Red Hat may someday start charging for it.That makes no sense, especially since both Oracle and Red
Hat are already charging for their products. Doesn't mean
you can't keep using free Linux and MySQL.