Querying a Large Partitioned DB
Team Amazing,
I am building a massive database for storing the syslogs of a room of
servers. The database gets about 25 million entries a day, and need to keep
them for 180 days. So the total size of the database will be about 4.5
billion records.
I need to be able to do full text searches on the message field, and of
course, it needs to be reasonably fast.
The table is partitioned daily and has this structure:
syslog=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+-------
public | systemevents | table | pgsql
public | systemevents_032909 | table | pgsql
public | systemevents_033009 | table | pgsql
public | systemevents_033109 | table | pgsql
public | systemevents_040109 | table | pgsql
public | systemevents_040209 | table | pgsql
public | systemevents_040309 | table | pgsql
public | systemevents_040409 | table | pgsql
public | systemevents_040509 | table | pgsql
public | systemevents_040609 | table | pgsql
public | systemevents_040709 | table | pgsql
public | systemevents_040909 | table | pgsql
public | systemevents_041009 | table | pgsql
(13 rows)
syslog=# \d systemevents
Table "public.systemevents"
Column | Type | Modifiers
--------------------+-----------------------------+-----------
message | character varying |
facility | integer |
fromhost | character varying(80) |
priority | integer |
devicereportedtime | timestamp without time zone |
receivedat | timestamp without time zone |
infounitid | integer |
syslogtag | character varying(80) |
message_index_col | tsvector |
Rules:
systemevents_insert_032909 AS
ON INSERT TO systemevents
WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without
time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp
without time zone DO INSTEAD INSERT INTO systemevents_032909 (message,
facility, fromhost, priority, devicereportedtime, receivedat, infounitid,
syslogtag, message_index_col)
...... [there are rules like that for each partition]
My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english',
'Term') LIMIT 25 OFFSET 0;
Here is an explain analyze:
Limit (cost=2422393.00..2422393.06 rows=25 width=153) (actual
time=93363.496..93363.610 rows=25 loops=1)
-> Sort (cost=2422393.00..2422933.05 rows=216019 width=153) (actual
time=93363.490..93363.532 rows=25 loops=1)
Sort Key: public.systemevents.devicereportedtime
Sort Method: top-N heapsort Memory: 22kB
-> Result (cost=0.00..2416297.10 rows=216019 width=153) (actual
time=20567.267..93362.574 rows=163 loops=1)
-> Append (cost=0.00..2415217.01 rows=216019 width=153)
(actual time=20567.244..93361.582 rows=163 loops=1)
-> Seq Scan on systemevents (cost=0.00..1750240.39
rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1)
Filter: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040309
systemevents (cost=1168.86..54860.45 rows=15253 width=152) (actual
time=82.429..275.589 rows=20 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040309 (cost=0.00..1165.04 rows=15253 width=0) (actual
time=50.029..50.029 rows=20 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040409
systemevents (cost=1038.56..52300.49 rows=14601 width=147) (actual
time=68.006..68.006 rows=0 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040409 (cost=0.00..1034.91 rows=14601 width=0) (actual
time=67.999..67.999 rows=0 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040509
systemevents (cost=1055.06..52482.72 rows=14644 width=150) (actual
time=63.257..63.257 rows=0 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040509 (cost=0.00..1051.40 rows=14644 width=0) (actual
time=63.251..63.251 rows=0 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040609
systemevents (cost=1842.50..88135.00 rows=24506 width=152) (actual
time=117.747..355.043 rows=34 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040609 (cost=0.00..1836.37 rows=24506 width=0) (actual
time=92.079..92.079 rows=34loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040709
systemevents (cost=1844.72..89127.11 rows=24790 width=152) (actual
time=114.387..262.360 rows=24 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040709 (cost=0.00..1838.52 rows=24790 width=0) (actual
time=84.848..84.848 rows=24loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_032909
systemevents (cost=320.54..17254.18 rows=4841 width=142) (actual
time=67.808..67.810 rows=1 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_032909 (cost=0.00..319.33 rows=4841 width=0) (actual
time=56.044..56.044 rows=1 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033009
systemevents (cost=1556.24..75179.65 rows=20931 width=149) (actual
time=77.644..335.360 rows=43 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_033009 (cost=0.00..1551.01 rows=20931 width=0) (actual
time=72.454..72.454 rows=43loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_033109
systemevents (cost=1892.97..92637.60 rows=25806 width=149) (actual
time=86.468..86.856 rows=4 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_033109 (cost=0.00..1886.52 rows=25806 width=0) (actual
time=70.397..70.397 rows=4 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040109
systemevents (cost=1395.47..66260.67 rows=18430 width=149) (actual
time=85.711..177.369 rows=12 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040109 (cost=0.00..1390.86 rows=18430 width=0) (actual
time=67.481..67.481 rows=12 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Heap Scan on systemevents_040209
systemevents (cost=1619.92..76738.75 rows=21326 width=150) (actual
time=89.065..89.067 rows=1 loops=1)
Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
-> Bitmap Index Scan on
systemevents_msg_idx_040209 (cost=0.00..1614.58 rows=21326 width=0) (actual
time=73.229..73.229 rows=1 loops=1)
Index Cond: (message_index_col @@
'''funkju'''::tsquery)
Total runtime: 93364.070 ms
(49 rows)
Can you give me any tips and suggestions about how to speed this up?
It seems like a smart query planner would understand the rules and know that
it should search in the last partitions first, since it is ordering by
device reported time.
Thanks!
justin
Justin Funk <funkju@iastate.edu> writes:
Can you give me any tips and suggestions about how to speed this up?
Use fewer partitions --- 180 is a lot. Maybe weekly partitioning would
be about right.
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Apr 10, 2009, at 10:15 AM, Justin Funk wrote:
Team Amazing,
I am building a massive database for storing the syslogs of a room of
servers. The database gets about 25 million entries a day, and need
to keep
them for 180 days. So the total size of the database will be about 4.5
billion records.I need to be able to do full text searches on the message field, and
of
course, it needs to be reasonably fast.
You could use pg-pool II or your own middleware to execute the search
query in parallel across all the partitions (maybe not all 180 at
once, though).
Cheers,
M
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)
iEYEARECAAYFAknfaVAACgkQqVAj6JpR7t65DQCgsN51pMWoY8WXyxss6cXRPHug
4h8An2IufbKuhrw4fyki4gBbjrkkQD0M
=5PRb
-----END PGP SIGNATURE-----
On Fri, 2009-04-10 at 09:15 -0500, Justin Funk wrote:
I need to be able to do full text searches on the message field, and
of course, it needs to be reasonably fast.The table is partitioned daily and has this structure:
My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@
to_tsquery('english', 'Term') LIMIT 25 OFFSET 0;
The partitioning does nothing to improve your typical query.
Loop through the tables from first to last until you have returned 25
records. That way you won't have to wait to search every table.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support