Is PostgreSQL an easy choice for a large CMS?

Started by Tony Lausinalmost 20 years ago28 messagesgeneral
Jump to latest
#1Tony Lausin
tonylausin@gmail.com

Hello all,

I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com

PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.

I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.

Regards,

Anthony

#2David Goodenough
david.goodenough@btconnect.com
In reply to: Tony Lausin (#1)
Re: Is PostgreSQL an easy choice for a large CMS?

On Sunday 30 April 2006 12:01, Tony Lausin wrote:

Hello all,

I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com

PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.

I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.

Regards,

Anthony

Very odd. I had always heard that MySql (at least originally) was a
"quick and dirty" database, easy to use, not fully standards compliant,
and not enterprise grade. Postgresql on the other hand was always
the heavyweight, standards compliant, enterprise db, which was more
difficult to use and set up but much more resilient. Postgresql has been
getting more UI support (often seen as a user friendly bonus) and
things like autovacuum support so that it is easier to use out of the box,
and MySql has been gaining standards compliance and resilience.

Funny how perceptions can differ.

David

#3Chris
dmagick@gmail.com
In reply to: Tony Lausin (#1)
Re: Is PostgreSQL an easy choice for a large CMS?

On 4/30/06, Tony Lausin <tonylausin@gmail.com> wrote:

Hello all,

I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com

PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site. I'm not
really a fan of MySQL, but I'll consider it if it truly is the better
choice in this case. I just don't understand how it would be. I'm
thinking this is solely in reference to VACUUM. Even with autovacuum
suport, I tend to agree there is at least one handicap.

I could really use some enlightenment on just where PostgreSQL fits in
a single-server, highly-trafficked web site serving mostly text,
pictures and possibly streaming media.

http://people.planetpostgresql.org/xzilla/index.php?/archives/151-Sean-Chittenden-on-RubyOnRails-Podcast.html

http://www.postgresql.org/about/casestudies/

http://www.postgresql.org/about/users

are all good places to start.

TBH it depends a lot on your data and how you structure it. I wrote a
small tute on how to get rid of left-join type queries and use
triggers to keep count(*) type queries to a minimum..

http://www.designmagick.com/article/36/Forum-Project/Database-Design-Issues

It's not always possible, but there are ways to minimize count(*),
min(field), max(field) type queries where postgresql isn't able to
optimize fully due to mvcc issues.

--
Postgresql & php tutorials
http://www.designmagick.com/

#4Tomi NA
hefest@gmail.com
In reply to: Tony Lausin (#1)
Re: Is PostgreSQL an easy choice for a large CMS?

On 4/30/06, Tony Lausin <tonylausin@gmail.com> wrote:

Hello all,

I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com

In my opinion, postgresql is not the way to go when building a cMS
simply because of the way it handles strings.
A CMS should be language/region agnostic i.e. supporting any chosen
locale subset, rather then just one locale, as postgresql does. You
can throw UTF-8 at the problem and it will enable data
storage/retrieval, but you'll still be stuck with incorrect text
ordering/sorting in any locale but the default.
Truth be told, I don't know of a single RDBMS which handles this issue
gracefully, but with postgresql, your basically stuck with a single
language/locale.
On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.

Tomislav

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tony Lausin (#1)
Re: Is PostgreSQL an easy choice for a large CMS?

"Tony Lausin" <tonylausin@gmail.com> writes:

I'm working on a CMS which requires an open source database capable of
handling hundreds of thousands of users simultaneously, with a high
rate of database writes, and without buckling. We're talking somewhere
between nerve.com/catch27.com and xanga.com/friendster.com

PostgreSQL is a personal favorite of mine, and my gut instinct is that
it's the best choice for a large scale CMS serving many users;
however, I'm getting antsy. I keep getting suggestions that Postgres
is really only suited to small and medium projects, and that I should
be looking at MySQL for a large scale database drive site.

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

regards, tom lane

#6Tony Lausin
tonylausin@gmail.com
In reply to: David Goodenough (#2)
Re: Is PostgreSQL an easy choice for a large CMS?

Very odd. I had always heard that MySql (at least originally) was a
"quick and dirty" database, easy to use, not fully standards compliant,
and not enterprise grade. Postgresql on the other hand was always
the heavyweight, standards compliant, enterprise db, which was more
difficult to use and set up but much more resilient. Postgresql has been
getting more UI support (often seen as a user friendly bonus) and
things like autovacuum support so that it is easier to use out of the box,
and MySql has been gaining standards compliance and resilience.

Funny how perceptions can differ.

Hi David,

I think it's very odd too, cause that's the exact same perception of
MySQL I have. My experience with MySQL is really limited to my
Wordpress blog, and that means I've never actually designed a database
with it. I wonder how much of this is just marketing and hype - either
on the side of MySQL or the side of Oracle.

#7Tony Lausin
tonylausin@gmail.com
In reply to: Tomi NA (#4)
Re: Is PostgreSQL an easy choice for a large CMS?

In my opinion, postgresql is not the way to go when building a cMS
simply because of the way it handles strings.
A CMS should be language/region agnostic i.e. supporting any chosen
locale subset, rather then just one locale, as postgresql does. You
can throw UTF-8 at the problem and it will enable data
storage/retrieval, but you'll still be stuck with incorrect text
ordering/sorting in any locale but the default.
Truth be told, I don't know of a single RDBMS which handles this issue
gracefully, but with postgresql, your basically stuck with a single
language/locale.
On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.

Tomislav

Hello Tomislav,

In my case, UTF-8 is a must. The site contextually will be directed to
North Americans, although I do anticipate that many users will from
time to time insert text in their own native languages. I myself
certainly may use non-latin characters from time to time (cyrillic),
so I find UTF-8 to be the best common bridge.

Regards,

Anthony

#8Tony Lausin
tonylausin@gmail.com
In reply to: Tom Lane (#5)
Re: Is PostgreSQL an easy choice for a large CMS?

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

regards, tom lane

Hi Tom,

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Regards,

Anthony

#9Matthew T. O'Connor
matthew@zeut.net
In reply to: Tony Lausin (#8)
Re: Is PostgreSQL an easy choice for a large CMS?

Tony Lausin wrote:

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Yes Slashdot runs MySQL, however what Tom said was that MySQL will fall
over under any heavy *concurrent-write* scenario. Concurrent-write is
the operative word in that sentence. Slashdot by it's very nature reads
from the database far far more than it writes. The only writes to the
database are things like a new story and user submitted comments, both
of with are small in comparison to the number of reads from the database.

Matt

#10Robby Russell
robby.lists@planetargon.com
In reply to: Tony Lausin (#7)
Re: Is PostgreSQL an easy choice for a large CMS?

On Apr 30, 2006, at 12:23 PM, Tony Lausin wrote:

On the other hand, if you don't need multilang support, you'll find
that postgresql is generally great to program and use.

Tomislav

Hello Tomislav,

In my case, UTF-8 is a must. The site contextually will be directed to
North Americans, although I do anticipate that many users will from
time to time insert text in their own native languages. I myself
certainly may use non-latin characters from time to time (cyrillic),
so I find UTF-8 to be the best common bridge.

Regards,

Anthony

Agreed, UTF-8 works fairly well. We just launched a site that has
translated content for ~20 languages using a programming language
that doesn't natively support unicode (Ruby... but jcode helps us
out) and it works great. This is all running on PostgreSQL and we
haven't seen any hiccups or complaints from people about strings not
displaying right.

I don't see why PostgreSQL wouldn't work for what you're working on...

-Robby

Robby Russell
Founder & Executive Director

PLANET ARGON, LLC
Ruby on Rails Development, Consulting & Hosting

www.planetargon.com
www.robbyonrails.com

+1 503 445 2457
+1 877 55 ARGON [toll free]
+1 815 642 4968 [fax]
#11Tony Lausin
tonylausin@gmail.com
In reply to: Matthew T. O'Connor (#9)
Re: Is PostgreSQL an easy choice for a large CMS?

Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.

Show quoted text

On 4/30/06, Matthew T. O'Connor <matthew@zeut.net> wrote:

Tony Lausin wrote:

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Yes Slashdot runs MySQL, however what Tom said was that MySQL will fall
over under any heavy *concurrent-write* scenario. Concurrent-write is
the operative word in that sentence. Slashdot by it's very nature reads
from the database far far more than it writes. The only writes to the
database are things like a new story and user submitted comments, both
of with are small in comparison to the number of reads from the database.

Matt

#12Chris Browne
cbbrowne@acm.org
In reply to: Tony Lausin (#1)
Re: Is PostgreSQL an easy choice for a large CMS?

Oops! tonylausin@gmail.com ("Tony Lausin") was seen spray-painting on a wall:

Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.

The other plausible option as a "possibly inexpensive" alternative to
PostgreSQL or MySQL(tm) is Firebird, which used to be Borland
Interbase...

They lost a couple of developers to MySQL AB, which may present some
problems, eventually...
--
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://linuxfinances.info/info/wp.html
"Tooltips are the proof that icons don't work."
-- Stefaan A. Eeckels

#13Harvey, Allan AC
HarveyA@OneSteel.com
In reply to: Chris Browne (#12)
Re: Is PostgreSQL an easy choice for a large CMS?

The open source offerings of ingres, ingres R3, runs the Wire section of our business

http://www.onesteel.com

Allan

Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I
should think
about.

The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended recipient, use, disclosure or copying of this information is prohibited. If you have received this document in error, please advise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses contained in this email or any attachments.

#14Stephen Frost
sfrost@snowman.net
In reply to: Tony Lausin (#11)
Re: Is PostgreSQL an easy choice for a large CMS?

* Tony Lausin (tonylausin@gmail.com) wrote:

Ahh. I see the point more clearly now. Perhaps the best strategy for
me is to press on with Postgres until the project is at a profitable
enough stage to merit a migration to Oracle - should Postgres become
an issue. I feel more confident about being able to migrate from
Postgres than from MySQL. I am financing this myself. hence the
apprehension about the cost. Is there another contender I should think
about.

You may want to read up on MySQL's licenseing system if you havn't
already as well. It's not necessairly what you expect... Basically, if
you don't want to open source (ie: GPL-compatible) the code which uses
MySQL you might need to buy a license from them. Check their website
for a more detailed (though not necessairly any clearer) discussion of
the issues.

Thanks,

Stephen

#15A.M.
agentm@themactionfaction.com
In reply to: Stephen Frost (#14)
Re: Is PostgreSQL an easy choice for a large CMS?

Just to round out the suggestions, if I remember correctly, the OP
mentioned something about chat. For entirely dynamic, disposable data,
perhaps a reliable database isn't what is called for at all. If the power
shuts off, it may not matter that some chat log is lost.

I suggest to use the right tool for the job and PostgreSQL doesn't claim
to cover all the bases.

-M

#16Tony Lausin
tonylausin@gmail.com
In reply to: A.M. (#15)
Re: Is PostgreSQL an easy choice for a large CMS?

Thanks for the suggestions, everyone. This is what I have so far:

I require an open source DBMS because I want to avoid being locked
into a particular vendor, especially one that will send me a big bill.
That leaves me with MySQL, PostgreSQL (my preference), Firebird, and
possibly Berkeley DB (if I'm willing to part with SQL). I posted a
message on comp.databases.berkeley looking for real world examples
where Berkeley is being used in a heavily accessed CMS, but have
gotten no reply. Firebird is looking very attractive too. At the
moment, a commercial license for MySQL with all the support bells and
whistles is an impossibility since I'm already paying for other
requirements. At this time, I can't immediately choose to GPL my app.

I learned that even PostgreSQL's UTF-8 comes with caveats. I've also
learned that MySQL will choke on heavy concurrent writes. Unlike in my
Slashdot.org and catch27.com examples, my site would get as many
writes as it does reads - all with heavy concurrency (a lot of
inserts, deletes, and updates).

There won't be any chat functions, but it's not like I'd store those
in an DB anyway; however there will be message posts. If you think of
the project as being a hybrid of online magazine and web-based
BBS/social networking, then you can imagine the different data types I
will be using. There will be some number crunching, but it's mostly
UTF-8 encoded English text with scattered non-English words thrown in.
There will also be user-submitted pictures (JPEGs, GIFs, PNGs).

I'm content to stick with PostgreSQL providing that it can handle
heavy concurrent writes. I'll explore Firebird more too. Wayne
Conrad's comment about his own use of PostgreSQL has me pretty
optimistic, as does the Proximity multimedia archival case study at
postgresql.org. That FeedLounge.com blog Chris linked was definitely
an eye-opener.

Regards,

Anthony

#17Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Tony Lausin (#8)
Re: Is PostgreSQL an easy choice for a large CMS?

On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

regards, tom lane

Hi Tom,

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.

#18Philip Hallstrom
postgresql@philip.pjkh.com
In reply to: Scott Marlowe (#17)
Re: Is PostgreSQL an easy choice for a large CMS?

On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

regards, tom lane

Hi Tom,

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.

I think slashdot uses memcache...

http://www.danga.com/memcached/users.bml

I would also read this about mysql's table locking:

http://dev.mysql.com/doc/refman/4.1/en/table-locking.html

Specifically, regarding myisam tables:

"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."

It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...

-philip

#19Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Philip Hallstrom (#18)
Re: Is PostgreSQL an easy choice for a large CMS?

On Mon, 2006-05-01 at 12:08, Philip Hallstrom wrote:

On Sun, 2006-04-30 at 14:32, Tony Lausin wrote:

[ rotfl... ] MySQL will fall over under any heavy concurrent-write
scenario. It's conceivable that PG won't do what you need either,
but if not I'm afraid you're going to be forced into Oracle or one
of the other serious-money DBs.

regards, tom lane

Hi Tom,

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.

I think slashdot uses memcache...

http://www.danga.com/memcached/users.bml

I was under the impression that they also created a lot of static text
for pages that are older than x number minutes or days, with updates to
those pages becoming further apart as the page for older.

I would also read this about mysql's table locking:

http://dev.mysql.com/doc/refman/4.1/en/table-locking.html

Specifically, regarding myisam tables:

"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."

It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...

Yeah, table level locking doesn't really scale well.

#20Philip Hallstrom
postgresql@philip.pjkh.com
In reply to: Scott Marlowe (#19)
Re: Is PostgreSQL an easy choice for a large CMS?

That's a scary idea - being forced into Oracle or Sybase. Isn't
Slashdot.org still running strongly off of MySQL?

Depends on how you define strongly. Slashdot has a LOT of code in place
to cache the content so it never has to hit the database directly.
Basically, every X seconds, the data creating the site is ripped outta
the database and produced as static content so that the writes and reads
don't clobber each other. And it still takes a pretty big and fast
machine to handle the load.

I think slashdot uses memcache...

http://www.danga.com/memcached/users.bml

I was under the impression that they also created a lot of static text
for pages that are older than x number minutes or days, with updates to
those pages becoming further apart as the page for older.

They very well could. I don't know anything beyond what that page says...

I would also read this about mysql's table locking:

http://dev.mysql.com/doc/refman/4.1/en/table-locking.html

Specifically, regarding myisam tables:

"Table locking enables many threads to read from a table at the same time,
but if a thread wants to write to a table, it must first get exclusive
access. During the update, all other threads that want to access this
particular table must wait until the update is done."

It doesn't take very many writes before this *really* becomes a problem.
We're implementing memcache at work to help with this issue...

Yeah, table level locking doesn't really scale well.

Which is ironic since the top of that page says: "For large tables, table
locking is much better than row locking for most applications..."

Which, just right off the bat, doesn't make much sense... oh well.

#21Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Philip Hallstrom (#20)
#22Tony Lausin
tonylausin@gmail.com
In reply to: Scott Marlowe (#21)
#23Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tony Lausin (#8)
#24Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tony Lausin (#16)
#25Tony Lausin
tonylausin@gmail.com
In reply to: Jim Nasby (#23)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tony Lausin (#25)
#27Wes James
comptekki@gmail.com
In reply to: Tony Lausin (#1)
#28Tony Lausin
tonylausin@gmail.com
In reply to: Jim Nasby (#23)