PostgreSQL versus MySQL for GPS Data
Hello,
I'm currently developing a program for centralizing the vehicle fleet GPS
information -http://openggd.sourceforge.net-, written in C++.
The database should have these requirements:
- The schema for this kind of data consists of several arguments -latitude,
longitude, time, speed. etc-, none of them is a text field.
- The database also should create a table for every truck -around 100
trucks-.
- There won't be more than 86400 * 365 rows per table -one GPS position
every second along one year-.
- There won't be more than 10 simultaneously read-only queries.
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?
Thanks in advance
Juan Karlos.
On Tue, Mar 17, 2009 at 12:25:08PM +0100, Juan Pereira wrote:
I'm currently developing a program for centralizing the vehicle fleet GPS
information -http://openggd.sourceforge.net-, written in C++.The database should have these requirements:
...
- The database also should create a table for every truck -around 100
trucks-.
Why ? This smells like a design problem.
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Hi Juan,
First of all congratulations on you project :)
We, at MADEIRA GPS, use Postgresql and PostGIS as the corner stone of our
fleet management solution and have tens of *millions* of records in a single
vehicles history table without any visible performance problem (we do however
clean it every year).
A thought, however, regarding your plans for gps data acquisition/storage:
every second... isn't that a bit too much?
We, for most of our customers, offer minute-by-minute tracking and, this is
important, *optimize* the vehicles' history table when writing data into it
by means of comparing the data from the last record - i.e. if the info is the
same *don't* write it! This will surely save you space ;-)
About simultaneous queries:
Last we checked we had ~200 of them with PGSQL still pumping at full
speed... ;-)
As a final note, IMHO, PGSQL/PostGIS is better than MySQL for a number of
reasons:
- proven robustness
- tight integration with PostGIS
- large user base (an always friendly bunch willing to help out each
other ;-) )
- ...
Regards,
Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam
Show quoted text
On Tuesday 17 March 2009 11:25:08 am Juan Pereira wrote:
Hello,
I'm currently developing a program for centralizing the vehicle fleet GPS
information -http://openggd.sourceforge.net-, written in C++.The database should have these requirements:
- The schema for this kind of data consists of several arguments -latitude,
longitude, time, speed. etc-, none of them is a text field.
- The database also should create a table for every truck -around 100
trucks-.
- There won't be more than 86400 * 365 rows per table -one GPS position
every second along one year-.
- There won't be more than 10 simultaneously read-only queries.The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?Thanks in advance
Juan Karlos.
Juan Pereira wrote:
- The database also should create a table for every truck -around 100
trucks-.
Why?
That's a rather clumsy design that makes it really hard to get aggregate
data across the fleet or do many interesting queries.
You're almost always better off using a single table with a composite
primary key like (truckid, datapointid) or whatever. If you'll be doing
lots of queries that focus on individual vehicles and expect performance
issues then you could partition the table by truckid, so you actually do
land up with one table per truck, but transparently accessible via table
inheritance so you can still query them all together.
Read up on PostgreSQL's table partitioning features.
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?
As you can imagine, PostgreSQL.
My main reasons are that in a proper transactional environment (ie
you're not using scary MyISAM tables) Pg is *much* better about handling
concurrent load, particularly concurrent activity by readers and writers.
Pg's table partitioning support is also an ideal fit for your application.
--
Craig Ringe
On Tue, Mar 17, 2009 at 7:47 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
Juan Pereira wrote:
- The database also should create a table for every truck -around 100
trucks-.Why?
That's a rather clumsy design that makes it really hard to get aggregate
data across the fleet or do many interesting queries.You're almost always better off using a single table with a composite
primary key like (truckid, datapointid) or whatever. If you'll be doing
lots of queries that focus on individual vehicles and expect performance
issues then you could partition the table by truckid, so you actually do
land up with one table per truck, but transparently accessible via table
inheritance so you can still query them all together.Read up on PostgreSQL's table partitioning features.
If there is little/no reason to span queries over various trucks, then
the OP's approach is ok, better than standard TP even. I agree though
that a single table approach is best unless 1) the table has to scale
to really, really large sizes or 2) there is a lot of churn on the
data (lots of bulk inserts and deletes).
merlin
Juan,
* Juan Pereira (juankarlos.openggd@gmail.com) wrote:
- The schema for this kind of data consists of several arguments -latitude,
longitude, time, speed. etc-, none of them is a text field.
I would think you might want *some* text fields, for vehicle
identification, as a seperate table about trucks.
- The database also should create a table for every truck -around 100
trucks-.
As mentioned elsewhere, you're probably fine with 1 table, but if it
becomes a problem you can always partition it up and have one view
across all of them (make sure to set up your constraints correctly and
enable constraint_exclusion if you go with this route). You could then
have, say, 10 tables, with 10 trucks in each.
- There won't be more than 86400 * 365 rows per table -one GPS position
every second along one year-.
As mentioned, you might want to eliminate duplicate entries; no sense
storing information that can be trivially derived.
- There won't be more than 10 simultaneously read-only queries.
While this is good to know, I kind of doubt it's accurate, and more
important is the number of simultaneous writers. I'm assuming 100, but
is that correct?
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?
Given the list you posted to, I would say you're likely to get alot of
PostgreSQL recommendations. Assuming you posted something similar to a
MySQL list, I would recommend that you not pick a solution based on the
number of responses you get but rather what you're most comfortable with
and understand best. If there is a learning curve either way, I think
PostgreSQL would be the best solution. If you're thinking about what to
have your application support, you might consider trying to support
both. Doing that from the beginning is usually best since you'll
develop your system at a high enough level to mitigate the problems
(syntax differences, performance differences, etc) between the
databases.
As an aside, and I don't know where the MySQL community is on this, but
we have the US Census TIGER Shapefile data set loaded into PostgreSQL
with PostGIS, with a geocoder that works with it. We should have a
complete packaged solution for loading it, indexing, etc, soon. That's
a fairly large, free, data set of all streets, addresses, etc, in the
US with lat/long information.
Thanks,
Stephen
Merlin,
I agree though
that a single table approach is best unless 1) the table has to scale
to really, really large sizes or 2) there is a lot of churn on the
data (lots of bulk inserts and deletes).
while agreeing, an additional question: could you please pronounce
"really, really large" in other units, like Gigabytes or Number of
rows (with average rowlength in bytes, of course)
That is: what table size would you or anybody consider really, really
large actually?
Harakd
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
LASIK good, steroids bad?
Craig Ringer wrote:
You're almost always better off using a single table with a composite
primary key like (truckid, datapointid) or whatever. If you'll be doing
lots of queries that focus on individual vehicles and expect performance
issues then you could partition the table by truckid, so you actually do
land up with one table per truck, but transparently accessible via table
inheritance so you can still query them all together.
Quite interesting!
The main reason why we thought using a table per truck was because
concurrent load: if there are 100 trucks trying to write in the same table,
maybe the performance is worse than having 100 tables, due to the fact that
the table is blocked for other queries while the writing process is running,
isn't it?
My main reasons are that in a proper transactional environment (ie
you're not using scary MyISAM tables) Pg is *much* better about handling
concurrent load, particularly concurrent activity by readers and writers.
2009/3/17 Craig Ringer <craig@postnewspapers.com.au>
Quite interesting again.
Thank you for your answers
Juan Karlos
Juan Pereira wrote:
Craig Ringer wrote:
You're almost always better off using a single table with a composite
primary key like (truckid, datapointid) or whatever. If you'll be doing
lots of queries that focus on individual vehicles and expect performance
issues then you could partition the table by truckid, so you actually do
land up with one table per truck, but transparently accessible via table
inheritance so you can still query them all together.Quite interesting!
The main reason why we thought using a table per truck was because
concurrent load: if there are 100 trucks trying to write in the same table,
maybe the performance is worse than having 100 tables, due to the fact that
the table is blocked for other queries while the writing process is running,
isn't it?
Wow, you are carrying around a lot of MySQL baggage with you. ;-)
You should probably read this:
http://www.postgresql.org/docs/8.3/static/mvcc-intro.html
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Tue, Mar 17, 2009 at 8:25 AM, Juan Pereira
<juankarlos.openggd@gmail.com> wrote:
Craig Ringer wrote:
You're almost always better off using a single table with a composite
primary key like (truckid, datapointid) or whatever. If you'll be doing
lots of queries that focus on individual vehicles and expect performance
issues then you could partition the table by truckid, so you actually do
land up with one table per truck, but transparently accessible via table
inheritance so you can still query them all together.Quite interesting!
The main reason why we thought using a table per truck was because
concurrent load: if there are 100 trucks trying to write in the same table,
maybe the performance is worse than having 100 tables, due to the fact that
the table is blocked for other queries while the writing process is running,
isn't it?
Using MySQL has a tendency to teach people bad habits, and this
assumption would be one of them. :)
Stephen Frost wrote:
As mentioned, you might want to eliminate duplicate entries; no sense
storing information that can be trivially derived.
It's pretty easy to do that with a trigger - and you can add a degree of
noise correction too, so that "wobble" in GPS position doesn't get
recorded - you only log changes of more than a certain distance.
--
Craig Ringer
Juan,
* Juan Pereira (juankarlos.openggd@gmail.com) wrote:
The main reason why we thought using a table per truck was because
concurrent load: if there are 100 trucks trying to write in the same table,
maybe the performance is worse than having 100 tables, due to the fact that
the table is blocked for other queries while the writing process is running,
isn't it?
That assumption is incorrect with regard to PostgreSQL, as you'll find
if you go through the other links suggested. Writing to a table does
not require a table-level write lock in PostgreSQL.
Thanks,
Stephen
On Mar 17, 2009, at 4:47 AM, Craig Ringer wrote:
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?As you can imagine, PostgreSQL.
My main reasons are that in a proper transactional environment (ie
you're not using scary MyISAM tables) Pg is *much* better about
handling
concurrent load, particularly concurrent activity by readers and
writers.
Actually, following this comment it should be noted that if you were
to choose MySQL you'd pretty much be making a decision to *not* be
using transactions at all. The reason for this is that while InnoDB
does support MySQL's geometry data types it does *not* support indexes
on geometry columns, only MyISAM does which does not support
transactions. Call me old fashioned if you like, but I like my data
to have integrity ;)
Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k
Harald Armin Massa, 17.03.2009 15:00:
That is: what table size would you or anybody consider really, really
large actually?
I recently attended and Oracle training by Tom Kyte and he said (partially joking though) that a database is only large when the size is measured in terrabytes :)
So really, really large would mean something like 100 petabytes
My personal opinion is that a "large" database has more than ~10 million rows in more than ~10 tables.
Thomas
On Tue, 2009-03-17 at 17:44 +0100, Thomas Kellerer wrote:
Harald Armin Massa, 17.03.2009 15:00:
That is: what table size would you or anybody consider really, really
large actually?I recently attended and Oracle training by Tom Kyte and he said (partially joking though) that a database is only large when the size is measured in terrabytes :)
So really, really large would mean something like 100 petabytes
My personal opinion is that a "large" database has more than ~10 million rows in more than ~10 tables.
It entirely depends on workload and hardware.
Joshua D. Drake
Thomas
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
On Tue, Mar 17, 2009 at 10:00 AM, Harald Armin Massa <chef@ghum.de> wrote:
Merlin,
I agree though
that a single table approach is best unless 1) the table has to scale
to really, really large sizes or 2) there is a lot of churn on the
data (lots of bulk inserts and deletes).while agreeing, an additional question: could you please pronounce
"really, really large" in other units, like Gigabytes or Number of
rows (with average rowlength in bytes, of course)That is: what table size would you or anybody consider really, really
large actually?
A good rule of thumb for large is table size > working ram. Huge
(really large) is 10x ram.
merlin
On Tue, Mar 17, 2009 at 05:44:48PM +0100, Thomas Kellerer wrote:
So really, really large would mean something like 100 petabytes
My personal opinion is that a "large" database has more than ~10 million
rows in more than ~10 tables.
Surely anything like "large" or "small" is a relative measure that
depends on personal experience. Because this mailing list is such
a diverse group I'm not sure if they'd ever be particularly useful
descriptions. If you're talking with a more cohesive group or you've
already defined what you're talking about then maybe--i.e. this database
is larger than that one, and so on.
I'd suggest we try and not describe things as small or large and just
use simple and unambiguous numeric descriptions; i.e. I'm expecting to
have a couple of tables with 10 to 100 million rows and the remaining 10
to 20 supporting tables having a few hundred rows.
I wouldn't expect row counts to be more accurate than a decimal log and
table counts to be more accurate than a ratio of two.
That's my two cents anyway!
--
Sam http://samason.me.uk/
This is question for Juan, have you asked the MySQL mailing list? What do
they say about this?
On Tue, Mar 17, 2009 at 9:05 AM, Erik Jones <ejones@engineyard.com> wrote:
On Mar 17, 2009, at 4:47 AM, Craig Ringer wrote:
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?
As you can imagine, PostgreSQL.
My main reasons are that in a proper transactional environment (ie
you're not using scary MyISAM tables) Pg is *much* better about handling
concurrent load, particularly concurrent activity by readers and writers.Actually, following this comment it should be noted that if you were to
choose MySQL you'd pretty much be making a decision to *not* be using
transactions at all. The reason for this is that while InnoDB does support
MySQL's geometry data types it does *not* support indexes on geometry
columns, only MyISAM does which does not support transactions. Call me old
fashioned if you like, but I like my data to have integrity ;)Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
- John L Cheng
At 12:05 AM 3/18/2009, Erik Jones wrote:
On Mar 17, 2009, at 4:47 AM, Craig Ringer wrote:
The question is: Which DBMS do you think is the best for this kind of
application? PostgreSQL or MySQL?As you can imagine, PostgreSQL.
My main reasons are that in a proper transactional environment (ie
you're not using scary MyISAM tables) Pg is *much* better about
handling
concurrent load, particularly concurrent activity by readers and
writers.Actually, following this comment it should be noted that if you were
to choose MySQL you'd pretty much be making a decision to *not* be
using transactions at all. The reason for this is that while InnoDB
does support MySQL's geometry data types it does *not* support indexes
on geometry columns, only MyISAM does which does not support
transactions. Call me old fashioned if you like, but I like my data
to have integrity ;)
Interesting, didn't know that.
But that's what I don't like about MySQL.
On the "brochure" they've got all the ticks on the feature
checkboxes. So the Bosses and CxOs think it's great.
But then you find out (often the hard way) that many star features
are mutually incompatible.
Link.
At 10:00 PM 3/17/2009, Harald Armin Massa wrote:
Merlin,
I agree though
that a single table approach is best unless 1) the table has to scale
to really, really large sizes or 2) there is a lot of churn on the
data (lots of bulk inserts and deletes).while agreeing, an additional question: could you please pronounce
"really, really large" in other units, like Gigabytes or Number of
rows (with average rowlength in bytes, of course)That is: what table size would you or anybody consider really, really
large actually?
Tiny: fits in CPU cache
Small: fits in RAM
Big: multiples of RAM.
Large: (size / storage bandwidth ) is measured in minutes.
Huge: (size / storage bandwidth ) is measured in hours.
Humungous: (size / storage bandwidth ) in days or larger units.
That said, the active working set might be a lot smaller than the
table, in which case you might prefer to use the size of the working
set (except when you are doing stuff like full backups or restores).
Link.