Is PostgreSQL an easy choice for a large CMS?
Hello all,
I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com
PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.
I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.
Regards,
Anthony
On Sunday 30 April 2006 12:01, Tony Lausin wrote:
Hello all,
I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.comPostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.Regards,
Anthony
Very odd. I had always heard that MySql (at least originally) was a
"quick and dirty" database, easy to use, not fully standards compliant,
and not enterprise grade. Postgresql on the other hand was always
the heavyweight, standards compliant, enterprise db, which was more
difficult to use and set up but much more resilient. Postgresql has been
getting more UI support (often seen as a user friendly bonus) and
things like autovacuum support so that it is easier to use out of the box,
and MySql has been gaining standards compliance and resilience.
Funny how perceptions can differ.
David
On 4/30/06, Tony Lausin <tonylausin@gmail.com> wrote:
Hello all,
I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.comPostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.
http://www.postgresql.org/about/casestudies/
http://www.postgresql.org/about/users
are all good places to start.
TBH it depends a lot on your data and how you structure it. I wrote a
small tute on how to get rid of left-join type queries and use
triggers to keep count(*) type queries to a minimum..
http://www.designmagick.com/article/36/Forum-Project/Database-Design-Issues
It's not always possible, but there are ways to minimize count(*),
min(field), max(field) type queries where postgresql isn't able to
optimize fully due to mvcc issues.
--
Postgresql & php tutorials
http://www.designmagick.com/
On 4/30/06, Tony Lausin <tonylausin@gmail.com> wrote:
Hello all,
I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com
In my opinion, postgresql is not the way to go when building a cMS
simply because of the way it handles strings.
A CMS should be language/region agnostic i.e. supporting any chosen
locale subset, rather then just one locale, as postgresql does. You
can throw UTF-8 at the problem and it will enable data
storage/retrieval, but you'll still be stuck with incorrect text
ordering/sorting in any locale but the default.
Truth be told, I don't know of a single RDBMS which handles this issue
gracefully, but with postgresql, your basically stuck with a single
language/locale.
On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.
Tomislav
"Tony Lausin" <tonylausin@gmail.com> writes:
I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com
PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site.
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.
regards, tom lane
Very odd. I had always heard that MySql (at least originally) was a
"quick and dirty" database, easy to use, not fully standards compliant,
and not enterprise grade. Postgresql on the other hand was always
the heavyweight, standards compliant, enterprise db, which was more
difficult to use and set up but much more resilient. Postgresql has been
getting more UI support (often seen as a user friendly bonus) and
things like autovacuum support so that it is easier to use out of the box,
and MySql has been gaining standards compliance and resilience.Funny how perceptions can differ.
Hi David,
I think it's very odd too, cause that's the exact same perception of
MySQL I have. My experience with MySQL is really limited to my
Wordpress blog, and that means I've never actually designed a database
with it. I wonder how much of this is just marketing and hype - either
on the side of MySQL or the side of Oracle.
In my opinion, postgresql is not the way to go when building a cMS
simply because of the way it handles strings.
A CMS should be language/region agnostic i.e. supporting any chosen
locale subset, rather then just one locale, as postgresql does. You
can throw UTF-8 at the problem and it will enable data
storage/retrieval, but you'll still be stuck with incorrect text
ordering/sorting in any locale but the default.
Truth be told, I don't know of a single RDBMS which handles this issue
gracefully, but with postgresql, your basically stuck with a single
language/locale.
On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.Tomislav
Hello Tomislav,
In my case, UTF-8 is a must. The site contextually will be directed to
North Americans, although I do anticipate that many users will from
time to time insert text in their own native languages. I myself
certainly may use non-latin characters from time to time (cyrillic),
so I find UTF-8 to be the best common bridge.
Regards,
Anthony
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.regards, tom lane
Hi Tom,
That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?
Regards,
Anthony
Tony Lausin wrote:
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?
Yes Slashdot runs MySQL, however what Tom said was that MySQL will fall
over under any heavy *concurrent-write* scenario. Concurrent-write is
the operative word in that sentence. Slashdot by it's very nature reads
from the database far far more than it writes. The only writes to the
database are things like a new story and user submitted comments, both
of with are small in comparison to the number of reads from the database.
Matt
On Apr 30, 2006, at 12:23 PM, Tony Lausin wrote:
On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.Tomislav
Hello Tomislav,
In my case, UTF-8 is a must. The site contextually will be directed to
North Americans, although I do anticipate that many users will from
time to time insert text in their own native languages. I myself
certainly may use non-latin characters from time to time (cyrillic),
so I find UTF-8 to be the best common bridge.Regards,
Anthony
Agreed, UTF-8 works fairly well. We just launched a site that has
translated content for ~20 languages using a programming language
that doesn't natively support unicode (Ruby... but jcode helps us
out) and it works great. This is all running on PostgreSQL and we
haven't seen any hiccups or complaints from people about strings not
displaying right.
I don't see why PostgreSQL wouldn't work for what you're working on...
-Robby
Robby Russell
Founder & Executive Director
PLANET ARGON, LLC
Ruby on Rails Development, Consulting & Hosting
www.planetargon.com
www.robbyonrails.com
+1 503 445 2457
+1 877 55 ARGON [toll free]
+1 815 642 4968 [fax]
Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.
Show quoted text
On 4/30/06, Matthew T. O'Connor <matthew@zeut.net> wrote:
Tony Lausin wrote:
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?Yes Slashdot runs MySQL, however what Tom said was that MySQL will fall
over under any heavy *concurrent-write* scenario. Concurrent-write is
the operative word in that sentence. Slashdot by it's very nature reads
from the database far far more than it writes. The only writes to the
database are things like a new story and user submitted comments, both
of with are small in comparison to the number of reads from the database.Matt
Oops! tonylausin@gmail.com ("Tony Lausin") was seen spray-painting on a wall:
Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.
The other plausible option as a "possibly inexpensive" alternative to
PostgreSQL or MySQL(tm) is Firebird, which used to be Borland
Interbase...
They lost a couple of developers to MySQL AB, which may present some
problems, eventually...
--
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://linuxfinances.info/info/wp.html
"Tooltips are the proof that icons don't work."
-- Stefaan A. Eeckels
The open source offerings of ingres, ingres R3, runs the Wire section of our business
Allan
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I
should think
about.
The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended recipient, use, disclosure or copying of this information is prohibited. If you have received this document in error, please advise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses contained in this email or any attachments.
Import Notes
Resolved by subject fallback
* Tony Lausin (tonylausin@gmail.com) wrote:
Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.
You may want to read up on MySQL's licenseing system if you havn't
already as well. It's not necessairly what you expect... Basically, if
you don't want to open source (ie: GPL-compatible) the code which uses
MySQL you might need to buy a license from them. Check their website
for a more detailed (though not necessairly any clearer) discussion of
the issues.
Thanks,
Stephen
Just to round out the suggestions, if I remember correctly, the OP
mentioned something about chat. For entirely dynamic, disposable data,
perhaps a reliable database isn't what is called for at all. If the power
shuts off, it may not matter that some chat log is lost.
I suggest to use the right tool for the job and PostgreSQL doesn't claim
to cover all the bases.
-M
Thanks for the suggestions, everyone. This is what I have so far:
I require an open source DBMS because I want to avoid being locked
into a particular vendor, especially one that will send me a big bill.
That leaves me with MySQL, PostgreSQL (my preference), Firebird, and
possibly Berkeley DB (if I'm willing to part with SQL). I posted a
message on comp.databases.berkeley looking for real world examples
where Berkeley is being used in a heavily accessed CMS, but have
gotten no reply. Firebird is looking very attractive too. At the
moment, a commercial license for MySQL with all the support bells and
whistles is an impossibility since I'm already paying for other
requirements. At this time, I can't immediately choose to GPL my app.
I learned that even PostgreSQL's UTF-8 comes with caveats. I've also
learned that MySQL will choke on heavy concurrent writes. Unlike in my
Slashdot.org and catch27.com examples, my site would get as many
writes as it does reads - all with heavy concurrency (a lot of
inserts, deletes, and updates).
There won't be any chat functions, but it's not like I'd store those
in an DB anyway; however there will be message posts. If you think of
the project as being a hybrid of online magazine and web-based
BBS/social networking, then you can imagine the different data types I
will be using. There will be some number crunching, but it's mostly
UTF-8 encoded English text with scattered non-English words thrown in.
There will also be user-submitted pictures (JPEGs, GIFs, PNGs).
I'm content to stick with PostgreSQL providing that it can handle
heavy concurrent writes. I'll explore Firebird more too. Wayne
Conrad's comment about his own use of PostgreSQL has me pretty
optimistic, as does the Proximity multimedia archival case study at
postgresql.org. That FeedLounge.com blog Chris linked was definitely
an eye-opener.
Regards,
Anthony
On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.regards, tom lane
Hi Tom,
That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?
Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.
On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.regards, tom lane
Hi Tom,
That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.
I think slashdot uses memcache...
http://www.danga.com/memcached/users.bml
I would also read this about mysql's table locking:
http://dev.mysql.com/doc/refman/4.1/en/table-locking.html
Specifically, regarding myisam tables:
"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."
It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...
-philip
On Mon, 2006-05-01 at 12:08, Philip Hallstrom wrote:
On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:
[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.regards, tom lane
Hi Tom,
That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.I think slashdot uses memcache...
I was under the impression that they also created a lot of static text
for pages that are older than x number minutes or days, with updates to
those pages becoming further apart as the page for older.
I would also read this about mysql's table locking:
http://dev.mysql.com/doc/refman/4.1/en/table-locking.html
Specifically, regarding myisam tables:
"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...
Yeah, table level locking doesn't really scale well.
That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.I think slashdot uses memcache...
I was under the impression that they also created a lot of static text
for pages that are older than x number minutes or days, with updates to
those pages becoming further apart as the page for older.
They very well could. I don't know anything beyond what that page says...
I would also read this about mysql's table locking:
http://dev.mysql.com/doc/refman/4.1/en/table-locking.html
Specifically, regarding myisam tables:
"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...Yeah, table level locking doesn't really scale well.
Which is ironic since the top of that page says: "For large tables, table
locking is much better than row locking for most applications..."
Which, just right off the bat, doesn't make much sense... oh well.