PostgreSQL - case studies
Hello all,
Apologies for the long mail.
I work for a company that is provides solutions mostly on a Java/Oracle
platform. Recently we moved on of our products to PostgreSQL. The main
reason was PostgreSQL's GIS capabilities and the inability of government
departments (especially road/traffic) to spend a lot of money for such
projects. This product is used to record details about accidents and
related analysis (type of road, when/why etc) with maps. Fortunately, even
in India, an accident reporting application does not have to handle many
tps :). So, I can't say PostgreSQL's performance was really tested in
this case.
Later, I tested one screen of one of our products - load testing with
Jmeter. We tried it with Oracle, DB2, PostgreSQL and Ingres, and
PostgreSQL easily out-performed the rest. We tried a transaction mix with
20+ SELECTS, update, delete and a few inserts.
After a really good experience with the database, I subscribed to all
PostgreSQL groups (my previous experience is all-Oracle) and reading these
mails, I realized that many organizations are using plan, 'not customized'
PostgreSQL for databases that handle critical applications. Since there
is no company trying to 'sell' PostgreSQL, many of us are not aware of
such cases.
Could some of you please share some info on such scenarios- where you are
supporting/designing/developing databases that run into at least a few
hundred GBs of data (I know, that is small by todays' standards)?
I went through
http://www.postgresql.org/about/casestudies/
and felt those are a bit old. I am sure PostgreSQL has matured a lot more
from the days when these case studies where posted. I went through the
case studies at EnterpiseDB and similar vendors too. But those are
customized PostgreSQL servers.
I am looking more for a 'first-hand' feedback
Any feedback - a few sentences with the db size, tps, h/w necessary to
support that, and acceptable down-time, type of application etc will be
greatly appreciated.
Our products are not of the blog/social networking type, but more of
on-line reservation type where half an hour down-time can lead to
significant revenue losses for customers.
Thank you,
Jayadevan
DISCLAIMER:
"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."
On Wed, Feb 10, 2010 at 9:39 AM, Jayadevan M
<Jayadevan.Maymala@ibsplc.com>wrote:
Any feedback - a few sentences with the db size, tps, h/w necessary to
support that, and acceptable down-time, type of application etc will be
greatly appreciated.
Our products are not of the blog/social networking type, but more of
on-line reservation type where half an hour down-time can lead to
significant revenue losses for customers.
Thank you,
Jayadevan
I don't have experience with DB size greater than 25-26 GB at the moment,
but Postgres surely has no problems handling my requirements. Mind you,
this Was on a stock Postgresql server running on FreeBSD 7.x without any
optimizations. Didn't seen any downtime apart from the ones that I goofed
up. H/w config was dual processor Xeon/8GB RAM and a single SAS 15K disk
(146 GB). The rig has been upgraded last week, but it ran fine for more than
18 months.
That said, we run an application which generates around 170-200 transactions
per second (mix of select insert, update and delete). AFAIK, most of us
using Postgres are running some or the other critical application where
downtime has a significant cost attached to it.
With regards
Amitabh
Quick note, please stick to text formatted email for the mailing list,
it's the preferred format.
On Tue, Feb 9, 2010 at 9:09 PM, Jayadevan M
<Jayadevan.Maymala@ibsplc.com> wrote:
Hello all,
Apologies for the long mail.
I work for a company that is provides solutions mostly on a Java/Oracle platform. Recently we moved on of our products to PostgreSQL. The main reason was PostgreSQL's GIS capabilities and the inability of government departments (especially road/traffic) to spend a lot of money for such projects. This product is used to record details about accidents and related analysis (type of road, when/why etc) with maps. Fortunately, even in India, an accident reporting application does not have to handle many tps :). So, I can't say PostgreSQL's performance was really tested in this case.
Later, I tested one screen of one of our products - load testing with Jmeter. We tried it with Oracle, DB2, PostgreSQL and Ingres, and PostgreSQL easily out-performed the rest. We tried a transaction mix with 20+ SELECTS, update, delete and a few inserts.
Please note that benchmarking oracle (and a few other commercial dbs)
and then publishing those results without permission of oracle is
considered to be in breech of their contract. Yeah, another wonderful
aspect of using Oracle.
That said, and as someone who is not an oracle licensee in any way,
this mimics my experience that postgresql is a match for oracle, db2,
and most other databases in the simple, single db on commodity
hardware scenario.
After a really good experience with the database, I subscribed to all PostgreSQL groups (my previous experience is all-Oracle) and reading these mails, I realized that many organizations are using plan, 'not customized' PostgreSQL for databases that handle critical applications. Since there is no company trying to 'sell' PostgreSQL, many of us are not aware of such cases.
Actually there are several companies that sell pgsql service, and some
that sell customized versions. RedHat, Command Prompt, EnterpriseDB,
and so on.
Could some of you please share some info on such scenarios- where you are supporting/designing/developing databases that run into at least a few hundred GBs of data (I know, that is small by todays' standards)?
There are other instances of folks on the list sharing this kind of
info you can find by searching the archives. I've used pgsql for
about 10 years for anywhere from a few megabytes to hundreds of
gigabytes, and all kinds of applications.
Where I currently work we have a main data store for a web app that is
about 180Gigabytes and growing, running on three servers with slony
replication. We handle somewhere in the range of 10k to 20k queries
per minute (a mix of 90% or so reads to 10% writes). Peak load can be
into the 30k or higher reqs / minute.
The two big servers that handle this load are dual quad core opteron
2.1GHz machines with 32Gig RAM and 16 15krpm SAS drives configured as
2 in RAID-1 for OS and pg_xlog, 2 hot spares, and 12 in a RAID-10 for
the main data. HW Raid controller is the Areca 1680 which is mostly
stable, except for the occasional (once a year or so) hang problem
which has been described, and which Areca has assured me they are
working on.
Our total downtime due to database outages in the last year or so has
been 10 to 20 minutes, and that was due to a RAID card driver bug that
hits us about once every 300 to 400 days. the majority of the down
time has been waiting for our hosting provider to hit the big red
switch and restart the main server.
Our other pgsql servers provide search facility, with a db size of
around 300Gig, and statistics at around ~1TB.
I am sure PostgreSQL has matured a lot more from the days when these case studies where posted. I went through the case studies at EnterpiseDB and similar vendors too. But those are customized PostgreSQL servers.
Not necessarily. They sell support more than anything, and the
majority of customization is not for stability but for additional
features, such as mpp queries or replication etc.
The real issue you run into is that many people don't want to tip
their hand that they are using pgsql because it is a competitive
advantage. It's inexpensive, capable, and relatively easy to use. If
your competitor is convinced that Oracle or MSSQL server with $240k in
licensing each year is the best choice, and you're whipping them with
pgsql, the last thing you want is for them to figure that out and
switch.
El 10/02/2010 6:49, Scott Marlowe escribi�:
Quick note, please stick to text formatted email for the mailing list,
it's the preferred format.On Tue, Feb 9, 2010 at 9:09 PM, Jayadevan M
<Jayadevan.Maymala@ibsplc.com> wrote:Hello all,
Apologies for the long mail.
I work for a company that is provides solutions mostly on a Java/Oracle platform. Recently we moved on of our products to PostgreSQL. The main reason was PostgreSQL's GIS capabilities and the inability of government departments (especially road/traffic) to spend a lot of money for such projects. This product is used to record details about accidents and related analysis (type of road, when/why etc) with maps. Fortunately, even in India, an accident reporting application does not have to handle many tps :). So, I can't say PostgreSQL's performance was really tested in this case.
Later, I tested one screen of one of our products - load testing with Jmeter. We tried it with Oracle, DB2, PostgreSQL and Ingres, and PostgreSQL easily out-performed the rest. We tried a transaction mix with 20+ SELECTS, update, delete and a few inserts.Please note that benchmarking oracle (and a few other commercial dbs)
and then publishing those results without permission of oracle is
considered to be in breech of their contract. Yeah, another wonderful
aspect of using Oracle.That said, and as someone who is not an oracle licensee in any way,
this mimics my experience that postgresql is a match for oracle, db2,
and most other databases in the simple, single db on commodity
hardware scenario.After a really good experience with the database, I subscribed to all PostgreSQL groups (my previous experience is all-Oracle) and reading these mails, I realized that many organizations are using plan, 'not customized' PostgreSQL for databases that handle critical applications. Since there is no company trying to 'sell' PostgreSQL, many of us are not aware of such cases.
Actually there are several companies that sell pgsql service, and some
that sell customized versions. RedHat, Command Prompt, EnterpriseDB,
and so on.Could some of you please share some info on such scenarios- where you are supporting/designing/developing databases that run into at least a few hundred GBs of data (I know, that is small by todays' standards)?
There are other instances of folks on the list sharing this kind of
info you can find by searching the archives. I've used pgsql for
about 10 years for anywhere from a few megabytes to hundreds of
gigabytes, and all kinds of applications.Where I currently work we have a main data store for a web app that is
about 180Gigabytes and growing, running on three servers with slony
replication. We handle somewhere in the range of 10k to 20k queries
per minute (a mix of 90% or so reads to 10% writes). Peak load can be
into the 30k or higher reqs / minute.The two big servers that handle this load are dual quad core opteron
2.1GHz machines with 32Gig RAM and 16 15krpm SAS drives configured as
2 in RAID-1 for OS and pg_xlog, 2 hot spares, and 12 in a RAID-10 for
the main data. HW Raid controller is the Areca 1680 which is mostly
stable, except for the occasional (once a year or so) hang problem
which has been described, and which Areca has assured me they are
working on.Our total downtime due to database outages in the last year or so has
been 10 to 20 minutes, and that was due to a RAID card driver bug that
hits us about once every 300 to 400 days. the majority of the down
time has been waiting for our hosting provider to hit the big red
switch and restart the main server.Our other pgsql servers provide search facility, with a db size of
around 300Gig, and statistics at around ~1TB.I am sure PostgreSQL has matured a lot more from the days when these case studies where posted. I went through the case studies at EnterpiseDB and similar vendors too. But those are customized PostgreSQL servers.
Not necessarily. They sell support more than anything, and the
majority of customization is not for stability but for additional
features, such as mpp queries or replication etc.The real issue you run into is that many people don't want to tip
their hand that they are using pgsql because it is a competitive
advantage. It's inexpensive, capable, and relatively easy to use. If
your competitor is convinced that Oracle or MSSQL server with $240k in
licensing each year is the best choice, and you're whipping them with
pgsql, the last thing you want is for them to figure that out and
switch.
Following with that subject, there are many apps on the world that are
using PostgreSQL for its business.
We are planning the design and deployment of the a large PostgreSQL
Cluster for a DWH-ODS-BI apps.
We are documenting everthing for give the information later to be
published on the PostgreSQL CaseStudies section.
We are using Slony-I for replication, PgBouncer for pooling
connections,Heartbeat for monitoring and fault detections and CentOS nd
FreeBSD like OS base.
The pg_xlog directory are in a RAID-1 and the main data in a RAID-10.
Do you have any recommendation?
Note: Any has a MPP querys implementation for PostgreSQL that can be shared?
Regards
--
--------------------------------------------------------------------------------
"Para ser realmente grande, hay que estar con la gente, no por encima de ella."
Montesquieu
Ing. Marcos Lu�s Ort�z Valmaseda
PostgreSQL System DBA&& DWH -- BI Apprentice
Centro de Tecnolog�as de Almacenamiento y An�lisis de Datos (CENTALAD)
Universidad de las Ciencias Inform�ticas
Linux User # 418229
-- PostgreSQL --
"TIP 4: No hagas 'kill -9' a postmaster"
http://www.postgresql-es.org
http://www.postgresql.org
http://www.planetpostgresql.org
-- DWH + BI --
The Data WareHousing Institute
http://www.tdwi.org
http://www.tdwi.org/cbip
---------------------------------------------------------------------------------
Jayadevan M <Jayadevan.Maymala@ibsplc.com> wrote:
Could some of you please share some info on such scenarios- where
you are supporting/designing/developing databases that run into at
least a few hundred GBs of data (I know, that is small by todays'
standards)?
I'm a database administrator for the Wisconsin Courts. We've got
about 200 PostgreSQL database clusters on about 100 servers spread
across the state. Databases range from tiny (few MB) to 1.3 TB.
Check out this for more info:
http://www.pgcon.org/2009/schedule/events/129.en.html
I hope that helps. If you have any particular questions not
answered by the above, just ask.
-Kevin
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
Could some of you please share some info on such scenarios- where
you are supporting/designing/developing databases that run into at
least a few hundred GBs of data (I know, that is small by todays'
standards)?
Just saw this, so figured I'd comment:
tsf=> \l+
List of databases
Name | Owner | Encoding | Collation | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+----------------------------+---------+-------------+---------------------------
beac | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres | 1724 GB | pg_default |
Doesn't look very pretty, but the point is that its 1.7TB. There's a
few other smaller databases on that system too. PG handles it quite
well, though this is primairly for data-mining.
Thanks,
Stephen
Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
Could some of you please share some info on such scenarios- where
you are supporting/designing/developing databases that run into at
least a few hundred GBs of data (I know, that is small by todays'
standards)?
At NuevaSync we use PG in a one-database-per-server design, with our own
replication system between cluster nodes. The largest node has more than
200G online.
This is an OLTP type workload.