Using Postgres to store high volume streams of sensor readings
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project: http://www.dehems.eu/ ):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;
Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;
So what can I do / how could I optimize the use of Postgres for this usage?
(I'm aware that there could be optimizations for this problem
(like computing the aggregates in memory and storing only these
aggregates at 10 minutes, or other interval), but I want to have the
full history (for data mining tasks for example);)
I will also like to present how was the Postgres benchmark implemented:
* test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,
SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);
* Postgres version: 8.3.3;
* database schema:
create table sds_benchmark_data (
client int4 not null,
sensor int4 not null,
timestamp int4 not null,
value int4
);alter table sds_benchmark_data add primary key (client, sensor, timestamp);
create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
* postgres.conf (the default values, I will list only what has
been changed):
max_connections = 20
shared_buffers = 24MB
work_mem = 64MB
maintenance_work_mem = 256MB
fsync = off
checkpoint_segments = 32
effective_cache_size = 1024MB
* inserts are done like this:
* generated 100 million readings by using the following rule:
* client is randomly chosen between 0 and 10 thousand;
* sensor is randomly chosen between 0 and 10;
* the timestamp is always increasing by one;
* the insert is done in batches of 500 thousand inserts (I've
also tried 5, 25, 50 and 100 thousand without big impact);
* the banch inserts are done through COPY sds_benchmark_data
FROM STDIN through libpq (by using UNIX (local) sockets);
What have I observed / tried:
* I've tested without the primary key and the index, and the
results were the best for inserts (600k inserts / s), but the
readings, worked extremly slow (due to the lack of indexing);
* with only the index (or only the primary key) the insert rate is
good at start (for the first 2 million readings), but then drops to
about 200 inserts / s;
So could someone point me where I'me wrong, or what can I do to
optimize Postgres for this particular task?
Thanks for your help,
Ciprian Dorin Craciun.
P.S.: I'll want to publish the benchmark results after they are
done, and I want to squeeze as much power out of Postgres as possible.
On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project: http://www.dehems.eu/ ):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;
For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
Regards,
Gerhard
On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift
<ml-postgresql-20081012-3518@gheift.de> wrote:
On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project: http://www.dehems.eu/ ):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
Regards,
Gerhard-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb
cTAAnRebAFq420MuW9aMmhoFOo+sPIje
=Zcoo
-----END PGP SIGNATURE-----
Hy Gerhard, I know about RRDTool, but it has some limitations:
* I must know in advance the number of sensors;
* I must create for each client a file (and If I have 10 thousand clients?);
* I have a limited amount of history;
* (I'm not sure about this one but i think that) I must insert
each data point by executing a command;
* and also I can not replicate (distribute) it easily;
Or have you used RRDTool in a similar context as mine? Do you have
some benchmarks?
Ciprian.
On 21 Lis, 13:50, ciprian.crac...@gmail.com ("Ciprian Dorin Craciun")
wrote:
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project:http://www.dehems.eu/):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;
How many devices you expect ?
As I understand number of expected is more or less:
no.of devices * no.sensors (about 10)
every 6second. Let assume that you have 100 devices it means 1000
inserts per 6s = 166 insert for 1 seconds.
* inserts are done like this:
* generated 100 million readings by using the following rule:
* client is randomly chosen between 0 and 10 thousand;
* sensor is randomly chosen between 0 and 10;
* the timestamp is always increasing by one;
* the insert is done in batches of 500 thousand inserts (I've
also tried 5, 25, 50 and 100 thousand without big impact);
* the banch inserts are done through COPY sds_benchmark_data
FROM STDIN through libpq (by using UNIX (local) sockets);
What have I observed / tried:
* I've tested without the primary key and the index, and the
results were the best for inserts (600k inserts / s), but the
readings, worked extremly slow (due to the lack of indexing);
* with only the index (or only the primary key) the insert rate is
good at start (for the first 2 million readings), but then drops to
about 200 inserts / s;
Try periodicaly execute REINDEX your index, and execute ANALYZE for
your table . To be honest should not influance on inserts but will
influance on select.
Michal Szymanski
http://blog.szymanskich.net
you'll have to provide us with some sort of test-case to get some answers,
please. (set of scripts, queries, etc).
On Fri, Nov 21, 2008 at 3:18 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
you'll have to provide us with some sort of test-case to get some answers,
please. (set of scripts, queries, etc).
Bellow is the content of my original post. Inside I mention
exactly the may the benchmark was conducted.
In short the data is inserted by using COPY sds_benchmark_data
from STDIN, in batches of 500 thousand data points.
I'll also paste the important part here:
* Postgres version: 8.3.3;
* database schema:
create table sds_benchmark_data (
client int4 not null,
sensor int4 not null,
timestamp int4 not null,
value int4
);alter table sds_benchmark_data add primary key (client, sensor, timestamp);
create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
* postgres.conf (the default values, I will list only what has
been changed):max_connections = 20
shared_buffers = 24MB
work_mem = 64MB
maintenance_work_mem = 256MB
fsync = off
checkpoint_segments = 32
effective_cache_size = 1024MB* inserts are done like this:
* generated 100 million readings by using the following rule:
* client is randomly chosen between 0 and 10 thousand;
* sensor is randomly chosen between 0 and 10;
* the timestamp is always increasing by one;
* the insert is done in batches of 500 thousand inserts (I've
also tried 5, 25, 50 and 100 thousand without big impact);
* the banch inserts are done through COPY sds_benchmark_data
FROM STDIN through libpq (by using UNIX (local) sockets);
Unfortunately I don't know what more information to give...
Thanks,
Ciprian Dorin Craciun.
On Fri, Nov 21, 2008 at 2:50 PM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:
Show quoted text
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project: http://www.dehems.eu/ ):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;So what can I do / how could I optimize the use of Postgres for this usage?
(I'm aware that there could be optimizations for this problem
(like computing the aggregates in memory and storing only these
aggregates at 10 minutes, or other interval), but I want to have the
full history (for data mining tasks for example);)I will also like to present how was the Postgres benchmark implemented:
* test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,
SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);
* Postgres version: 8.3.3;* database schema:
create table sds_benchmark_data (
client int4 not null,
sensor int4 not null,
timestamp int4 not null,
value int4
);alter table sds_benchmark_data add primary key (client, sensor, timestamp);
create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
* postgres.conf (the default values, I will list only what has
been changed):max_connections = 20
shared_buffers = 24MB
work_mem = 64MB
maintenance_work_mem = 256MB
fsync = off
checkpoint_segments = 32
effective_cache_size = 1024MB* inserts are done like this:
* generated 100 million readings by using the following rule:
* client is randomly chosen between 0 and 10 thousand;
* sensor is randomly chosen between 0 and 10;
* the timestamp is always increasing by one;
* the insert is done in batches of 500 thousand inserts (I've
also tried 5, 25, 50 and 100 thousand without big impact);
* the banch inserts are done through COPY sds_benchmark_data
FROM STDIN through libpq (by using UNIX (local) sockets);What have I observed / tried:
* I've tested without the primary key and the index, and the
results were the best for inserts (600k inserts / s), but the
readings, worked extremly slow (due to the lack of indexing);
* with only the index (or only the primary key) the insert rate is
good at start (for the first 2 million readings), but then drops to
about 200 inserts / s;So could someone point me where I'me wrong, or what can I do to
optimize Postgres for this particular task?Thanks for your help,
Ciprian Dorin Craciun.P.S.: I'll want to publish the benchmark results after they are
done, and I want to squeeze as much power out of Postgres as possible.
see, I am affraid of the part when it says "randomly", because you probably
used random(), which isn't the fastest thing on earth :)
Ciprian Dorin Craciun wrote:
[............]
So what can I do / how could I optimize the use of Postgres for this usage?
Hello, here you have some comments that will probably help you to get
more from this test machine ......
* test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,
SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);
* Postgres version: 8.3.3;
(RAID 0 is never a good thing with databases if you don't have another
redundant system that can be used to restore your data or if you want to
minimize your downtime.)
Putting the database transaction logs ($PGDATA/pg_xlog) on its own
dedicated disk resource will probably increase write performace.
create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
You don't need this index if the primary key is (client, sensor, timestamp).
shared_buffers = 24MB
I would increase this to 25% of your RAM. 2GB in the test machine (if it
is a dedicated postgres server). It will help read-rate.
You will probably have to increase kernel.shmmax and kernel.shmall in
/etc/sysctl.conf (linux)
fsync = off
Do you have the results with this on?
checkpoint_segments = 32
I would increase this to 128-256 if you work with large write loads
(several GB of data). $PGDATA/pg_xlog would use some extra disk if you
change this value.
effective_cache_size = 1024MB
50% of your RAM. 4GB in the test machine (if it is a dedicated postgres
server). It will probably help read-rate.
In addition, I will try to change these parameters also:
wal_buffers = 64
random_page_cost = 2.0
In general, faster and more disks in a RAID 1+0 / 0+1 will help write
performace. autovacuum should be on.
regards.
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway
PGP Public Key: http://folk.uio.no/rafael/
On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;
It's not quite what you're asking for; but have you checked out any
of the databases that have resulted from the StreamSQL research? The
Borealis engine[1]http://www.cs.brown.edu/research/borealis/public/ looks like the most recent development, but I'm
not sure how you are with academic style code. I've never used it
before, but it sounds as though it was designed for exactly your sort of
problem.
Sam
On Fri, Nov 21, 2008 at 3:29 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
see, I am affraid of the part when it says "randomly", because you probably
used random(), which isn't the fastest thing on earth :)
I can assure you this is not the problem... The other storage
engines work quite well, and also the generation speed is somewhere at
30 million records / second, which 100 greater than the speed achieved
by the fastest store I've tested so far...
Ciprian.
You might want to look into how OpenNMS uses RRDTool. It is able to handle
a huge number of nodes by queuing inserts into the RRDs and using JRobin.
I'm not sure if it is a great solution for what you are looking for, but
I've found its performance scales quite well. I'm getting well over 500
updates per second using JRobin and an NFS disk. I'm sure I could do better
but by my application is limited by hibernate. Each of my files stores an
MRTG's worth of data and keeps the average and max of four points in 136k.
Here is an iostat -dmx 1 of the rrd update:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvda 7.00 33.00 122.00 2093.00 0.48 8.34 8.16
150.70 69.92 0.45 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvda 493.00 26.00 248.00 540.00 2.84 2.21 13.13
43.70 55.42 1.26 99.60
My big problem with RRD is not being able to query it like you can a
database and it tending not to keep exact values. Oh, and it being pretty
cryptic.
Sorry to clutter up the list of RRD stuff. I just thought it might be
pertinent to Ciprian.
On Fri, Nov 21, 2008 at 8:03 AM, Ciprian Dorin Craciun <
ciprian.craciun@gmail.com> wrote:
Show quoted text
On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift
<ml-postgresql-20081012-3518@gheift.de> wrote:On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
Hello all!
I would like to ask some advice about the following problem
(related to the Dehems project: http://www.dehems.eu/ ):
* there are some clients; (the clients are in fact house holds;)
* each device has a number of sensors (about 10), and not all the
clients have the same sensor; also sensors might appear and disappear
dynamicaly; (the sensors are appliances;)
* for each device and each sensor a reading is produced (at about
6 seconds); (the values could be power consumptions;)
* I would like to store the following data: (client, sensor,
timestamp, value);
* the usual queries are:
* for a given client (and sensor), and time interval, I need
the min, max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and
avg of the values;
* other statistics;Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
Regards,
Gerhard-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb
cTAAnRebAFq420MuW9aMmhoFOo+sPIje
=Zcoo
-----END PGP SIGNATURE-----Hy Gerhard, I know about RRDTool, but it has some limitations:
* I must know in advance the number of sensors;
* I must create for each client a file (and If I have 10 thousand
clients?);
* I have a limited amount of history;
* (I'm not sure about this one but i think that) I must insert
each data point by executing a command;
* and also I can not replicate (distribute) it easily;Or have you used RRDTool in a similar context as mine? Do you have
some benchmarks?Ciprian.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes:
In short the data is inserted by using COPY sds_benchmark_data
from STDIN, in batches of 500 thousand data points.
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster. Also, as already noted, drop the redundant index.
regards, tom lane
Thank's for your info! Please see below...
On Fri, Nov 21, 2008 at 4:14 PM, Rafael Martinez
<r.m.guerrero@usit.uio.no> wrote:
Ciprian Dorin Craciun wrote:
[............]So what can I do / how could I optimize the use of Postgres for this usage?
Hello, here you have some comments that will probably help you to get
more from this test machine ......* test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,
SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);
* Postgres version: 8.3.3;(RAID 0 is never a good thing with databases if you don't have another
redundant system that can be used to restore your data or if you want to
minimize your downtime.)Putting the database transaction logs ($PGDATA/pg_xlog) on its own
dedicated disk resource will probably increase write performace.
Unfortunately this is a test machine shared with other projects,
and I can't change (for now) the disk setup... When I'll have a
dedicated machine I'll do this... For now nop...
create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);
You don't need this index if the primary key is (client, sensor, timestamp).
In both the primary key and the index are listed here because when
I've tested I have switched between them... (Almost the same behaviour
with only the primary key, or with only the index)...
shared_buffers = 24MB
I would increase this to 25% of your RAM. 2GB in the test machine (if it
is a dedicated postgres server). It will help read-rate.You will probably have to increase kernel.shmmax and kernel.shmall in
/etc/sysctl.conf (linux)
Modified it.
fsync = off
Do you have the results with this on?
Doesn't help at all... I guest the problem is with the index building...
checkpoint_segments = 32
I would increase this to 128-256 if you work with large write loads
(several GB of data). $PGDATA/pg_xlog would use some extra disk if you
change this value.
Updated it to 256.
effective_cache_size = 1024MB
50% of your RAM. 4GB in the test machine (if it is a dedicated postgres
server). It will probably help read-rate.
Updated it to 4096MB.
In addition, I will try to change these parameters also:
wal_buffers = 64
random_page_cost = 2.0
Currently wal_buffers is 64kB, I've set it to 64MB???
random_page_cost was 4.0, decreased it to 2.0???
In general, faster and more disks in a RAID 1+0 / 0+1 will help write
performace. autovacuum should be on.regards.
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, NorwayPGP Public Key: http://folk.uio.no/rafael/
So after the updates, the results were better, but still under 1k
inserts / second...
Thanks again for your info!
On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes:
In short the data is inserted by using COPY sds_benchmark_data
from STDIN, in batches of 500 thousand data points.Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster. Also, as already noted, drop the redundant index.regards, tom lane
Hy!
It won't be that difficult to use a local file (now I'm using the
same computer), but will it really make a difference? (I mean have you
seen such issues?)
Thanks,
Ciprian Craciun.
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes:
On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster. Also, as already noted, drop the redundant index.
It won't be that difficult to use a local file (now I'm using the
same computer), but will it really make a difference?
Yes. I'm not sure how much, but there is nontrivial protocol overhead.
regards, tom lane
On Fri, Nov 21, 2008 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes:
On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster. Also, as already noted, drop the redundant index.It won't be that difficult to use a local file (now I'm using the
same computer), but will it really make a difference?Yes. I'm not sure how much, but there is nontrivial protocol overhead.
regards, tom lane
Ok, I have tried it, and no improvements... (There is also the
drawback that I must run the inserts as the superuser...)
Ciprian Craciun.
On Fri, 21 Nov 2008, Tom Lane wrote:
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster.
The fact that the inserts are reported as fast initially but slow as the
table and index size grow means it's probably a disk bottleneck rather
than anything related to the client itself. If the network overhead was
the issue, I wouldn't expect it to start fast like that. Ditto for
concerns about the random function being slow. Either of those might
speed up the initial, fast period a bit, but they're not going to impact
the later slowdown.
Ciprian, the main interesting piece of data to collect here is a snapshot
of a few samples lines from the output from "vmstat 1" during the initial,
fast loading section versus the slower period. I think what's happening
to you is that maintaining the index blocks on the disk is turning into
increasingly random I/O as the size of the table grows, and your disks
just can't keep up with that. What I'd expect is that initially the
waiting for I/O "wa" figure will be low, but it will creep up constantly
and at some point spike up hard after the working set of data operated on
exceeds memory.
The fact that PostgreSQL performs badly here compared to the more
lightweight databases you've used isn't that much of a surprise. There's
a fair amount of overhead for the write-ahead log and the MVCC
implementation in the database, and your application is suffering from all
that robustness overhead but not really gaining much of a benefit from it.
The main things that help in this sort of situation are increases in
shared_buffers and checkpoint_segments, so that more database information
is stored in RAM for longer rather than being pushed to disk too quickly,
but what Rafael suggested already got you most of the possible improvement
here. You might get an extra bit of boost by adjusting the index
FILLFACTOR upwards (default is 90, if you're never updating you could try
100). I doubt that will be anything but a minor incremental improvement
though.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Nov 21, 2008 at 7:42 PM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:
On Fri, Nov 21, 2008 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes:
On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster. Also, as already noted, drop the redundant index.It won't be that difficult to use a local file (now I'm using the
same computer), but will it really make a difference?Yes. I'm not sure how much, but there is nontrivial protocol overhead.
regards, tom lane
Ok, I have tried it, and no improvements... (There is also the
drawback that I must run the inserts as the superuser...)Ciprian Craciun.
If I think better, the protocol overhead is not important...
Because if I don't use indexes, I obtain 600k inserts / second... (So
the test was useless... :) But I learn from my mistakes...)
Ciprian.
On Fri, Nov 21, 2008 at 7:45 PM, Greg Smith <gsmith@gregsmith.com> wrote:
On Fri, 21 Nov 2008, Tom Lane wrote:
Not sure if it applies to your real use-case, but if you can try doing
the COPY from a local file instead of across the network link, it
might go faster.The fact that the inserts are reported as fast initially but slow as the
table and index size grow means it's probably a disk bottleneck rather than
anything related to the client itself. If the network overhead was the
issue, I wouldn't expect it to start fast like that. Ditto for concerns
about the random function being slow. Either of those might speed up the
initial, fast period a bit, but they're not going to impact the later
slowdown.Ciprian, the main interesting piece of data to collect here is a snapshot of
a few samples lines from the output from "vmstat 1" during the initial, fast
loading section versus the slower period. I think what's happening to you
is that maintaining the index blocks on the disk is turning into
increasingly random I/O as the size of the table grows, and your disks just
can't keep up with that. What I'd expect is that initially the waiting for
I/O "wa" figure will be low, but it will creep up constantly and at some
point spike up hard after the working set of data operated on exceeds
memory.
Ok. Silly question: how do I "vmstat 1"???
The problem is indeed the indexes... So If we analyze the insert
patterns: client id's randomly distributed and sensor id's the same,
and the index is created ontop of these two, it means that
(probabilistically) speaking after 100 thousand inserts (10 thousand
clients and 10 sensors), all the index pages would be dirty...
Indeed I could prolongue the flush by using bigger and bigger
memory, but this doesn't help for 100 million records...
The fact that PostgreSQL performs badly here compared to the more
lightweight databases you've used isn't that much of a surprise. There's a
fair amount of overhead for the write-ahead log and the MVCC implementation
in the database, and your application is suffering from all that robustness
overhead but not really gaining much of a benefit from it. The main things
that help in this sort of situation are increases in shared_buffers and
checkpoint_segments, so that more database information is stored in RAM for
longer rather than being pushed to disk too quickly, but what Rafael
suggested already got you most of the possible improvement here. You might
get an extra bit of boost by adjusting the index FILLFACTOR upwards (default
is 90, if you're never updating you could try 100). I doubt that will be
anything but a minor incremental improvement though.
About the fillfactor, on the contrary, I think I should set it
lower (as the index fills very quickly)... I've set it to 10% and it
behave a little better than with (10)... I'll run a full 100 million
test to see where it breaks...
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Thanks,
Ciprian Craciun.
On Fri, 21 Nov 2008, Sam Mason wrote:
It's not quite what you're asking for; but have you checked out any
of the databases that have resulted from the StreamSQL research?
A streaming database approach is in fact ideally suited to handling this
particular problem. Looking at the original request here:
* for a given client (and sensor), and time interval, I need the min,
max, and avg of the values;
* for a given time interval (and sensor), I need min, max, and avg of
the values;
The most efficient way possible to compute these queries is to buffer the
full interval worth of data in memory as the values are being inserted,
compute these aggregates once the time window for the interval has ended,
then write a summarized version of the data. Doing that sort of thing and
then operating on the aggregated data, rather than maintaining a bulky
index covering every single row, is exactly the sort of thing a good
streaming database would handle for you.
I can't comment on the current state of Borealis. But as the original
focus of the streaming database research that spawned Truviso where I work
was accelerating data capture from sensor networks, I know this general
style of approach is quite beneficial here. For example,
http://telegraph.cs.berkeley.edu/telegraphcq/v2.1/ is an open-source
implementation from that research that's integrated with an older version
of PostgreSQL. If you look at the "windowed aggregates" example there, it
shows what a streaming query similar to the requirements here would look
like: an average and other statistics produced on a per-interval basis.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD