Which partition scheme makes sense for my time based IoT-datagrams?
Hi all,
I have a table storing datagrams from some IoT-devices, with one
datagram per device per day most likely for around 75'000 devices
currently. I want to test query performance with a partitioned table
and am interested in the following queries mostly:
* querying arbitrary datagrams by their ID
* querying datagrams being X old based on some timestamp
* 15 minutes
* 15 days
* 15 months
My app isn't prepared to e.g. migrate things in the background,
instead I would like to have Postgres deal with all apsects as
transparent as possible. At least for the next few years, things
worked without partitions in the past as well.
Therefore I thought of simply partitioning by RANGE of the timestamp
when the datagram has been received and create individual partitions
per half a year. That means that in most cases only the most current
1 partition needs to be queried, with the last 3 in many other cases.
Other approaches I've read on this list were e.g. partitioning the
most current months individually and afterwards moving rows to some
other "archive"-like table. Besides of the lack of the necessary
infrastructure, in my use case in theory users need to be able to ask
for the last 15 months at some arbitrary point in history at any time.
I wouldn't like to deal with different tables or alike in my app.
My approach would result in 24 tables already, with 2 more per year.
Does that for itself sound bad already? Is that a limit the planner
can deal with most likely or do I don't even need to care for hundreds
or partitions?
If partitioned by timestamp, how are lookups by ID performed? Is that
a sequential scan on all partitions, e.g. with using available indexes
per partition?
Is there some way to RANGE by timestamp and ID, by keeping the
half-year tables only? So that the planner knows easier which
partition to look at in case of IDs only? Or is that simply like
querying an ID-index of each partition?
The following is what I have currently, extra verbose to support
Postgres 10:
CREATE TABLE datagram
(
id bigserial NOT NULL,
src_re integer NOT NULL,
src_clt integer NOT NULL,
src_meter integer NOT NULL,
captured_at timestamp with time zone NOT NULL,
captured_rssi smallint NOT NULL,
oms_status smallint NOT NULL,
oms_enc bytea,
oms_dec bytea
) PARTITION BY RANGE (captured_at);
CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM ('1970-01-01') TO ('1970-07-01');
CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM ('1970-07-01') TO ('1971-01-01');
[...]
ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT pk_datagram_y1970_h1 PRIMARY KEY (id);
ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT ck_datagram_y1970_h1_oms_data_avail CHECK (oms_enc IS NOT NULL OR oms_dec IS NOT NULL);
ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_re FOREIGN KEY (src_re) REFERENCES real_estate (id);
ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_clt FOREIGN KEY (src_clt) REFERENCES collector (id);
ALTER TABLE datagram_y1970_h1 ADD CONSTRAINT fk_datagram_y1970_h1_src_meter FOREIGN KEY (src_meter) REFERENCES meter (id);
ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT pk_datagram_y1970_h2 PRIMARY KEY (id);
ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT ck_datagram_y1970_h2_oms_data_avail CHECK (oms_enc IS NOT NULL OR oms_dec IS NOT NULL);
ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_re FOREIGN KEY (src_re) REFERENCES real_estate (id);
ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_clt FOREIGN KEY (src_clt) REFERENCES collector (id);
ALTER TABLE datagram_y1970_h2 ADD CONSTRAINT fk_datagram_y1970_h2_src_meter FOREIGN KEY (src_meter) REFERENCES meter (id);
[...]
CREATE INDEX idx_datagram_y1970_h1_for_time_window ON datagram_y1970_h1 USING btree (src_meter, captured_at DESC);
CREATE INDEX idx_datagram_y1970_h2_for_time_window ON datagram_y1970_h2 USING btree (src_meter, captured_at DESC);
[...]
An example query condition is the following, while "P5D" can simply be
"P15M" or alike instead.
WHERE (
"real_estate"."id" IN ([...])
AND "meter"."id" IN ([...])
AND "datagram"."captured_at" BETWEEN (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) - CAST('P5D' AS INTERVAL)) AND (CAST('2020-08-28T10:34:32.855+02:00' AS TIMESTAMP WITH TIME ZONE) + CAST('P0D' AS INTERVAL))
)
Thanks for sharing your opinions!
Mit freundlichen Grüßen
Thorsten Schöning
--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
E-Mail: Thorsten.Schoening@AM-SoFT.de
Web: http://www.AM-SoFT.de/
Telefon: 05151- 9468- 0
Telefon: 05151- 9468-55
Fax: 05151- 9468-88
Mobil: 0178-8 9468-04
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska
Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.
Mit freundlichen Grüßen
Thorsten Schöning
Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de
AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0
Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80
CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1
Büro Dallgow-Döberitz
Tel: 03322 507 020
Büro Kloster Lehnin
Tel: 033207 566 530
PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0
Büro Neuruppin
Tel: 033932 606 090
ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410
Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0
Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7
Büro Liebenwalde
Tel: 033054 810 00
HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97
Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6
Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0
MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3
Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska
On Wed, 2021-02-10 at 16:09 +0100, Thorsten Schöning wrote:
I have a table storing datagrams from some IoT-devices, with one
datagram per device per day most likely for around 75'000 devices
currently. I want to test query performance with a partitioned table
and am interested in the following queries mostly:* querying arbitrary datagrams by their ID
* querying datagrams being X old based on some timestamp
* 15 minutes
* 15 days
* 15 months
You want to query ALL individual datagrams from the past 15 months??
I guess not, and that you want some aggregation. For that, materialized
views might be interesting.
Therefore I thought of simply partitioning by RANGE of the timestamp
when the datagram has been received and create individual partitions
per half a year. That means that in most cases only the most current
1 partition needs to be queried, with the last 3 in many other cases.My approach would result in 24 tables already, with 2 more per year.
Does that for itself sound bad already? Is that a limit the planner
can deal with most likely or do I don't even need to care for hundreds
or partitions?
24 partitions is harmless. I think that a partition per half year might
not be enough, but it depends on the queries.
You should use PostgreSQL v12 or better.
One point you didn't touch, but should consider, is getting rid of old data.
You will want to do that at some point, or at least you want to keep only
aggregates of the data for historical statistical data.
That would be very easy if you partitoin correctly, and it would keep the
number of partitions in a manageable range.
If partitioned by timestamp, how are lookups by ID performed? Is that
a sequential scan on all partitions, e.g. with using available indexes
per partition?
They will scan all partitions. You have to be prepared that most queries
will become at least slightly slower with partitioning. That is
expected.
Is there some way to RANGE by timestamp and ID, by keeping the
half-year tables only? So that the planner knows easier which
partition to look at in case of IDs only? Or is that simply like
querying an ID-index of each partition?
Sure, you can partition by ID (ranges; or perhaps hash partitioning,
if you only ever query for a single ID) and subpartition by timestamp.
This will speed up the queries by ID, since only a single partition
will be searched. You could then almost be as fast as with a single
monolitic table. Don't forget that the speed of an index scan does
not (or almost not) depend on the size of the table.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Guten Tag Laurenz Albe,
am Mittwoch, 10. Februar 2021 um 17:45 schrieben Sie:
They will scan all partitions. You have to be prepared that most queries
will become at least slightly slower with partitioning. That is
expected.
Does "most queries" address thos efor IDs scanning all partitions or
those time-based as well? In the end, I'm trying to improve query
performance by reducing the size of indexes, number of rows etc. per
table using partitions. :-)
The docs read like my use case would fit to partitioning as well and
I've already tried all sorts of indexing on the unpartitioned table to
improve some of my example queries with not much luck.
Mit freundlichen Grüßen
Thorsten Schöning
--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
E-Mail: Thorsten.Schoening@AM-SoFT.de
Web: http://www.AM-SoFT.de/
Telefon: 05151- 9468- 0
Telefon: 05151- 9468-55
Fax: 05151- 9468-88
Mobil: 0178-8 9468-04
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska
Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.
Mit freundlichen Grüßen
Thorsten Schöning
Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de
AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0
Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80
CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1
Büro Dallgow-Döberitz
Tel: 03322 507 020
Büro Kloster Lehnin
Tel: 033207 566 530
PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0
Büro Neuruppin
Tel: 033932 606 090
ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410
Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0
Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7
Büro Liebenwalde
Tel: 033054 810 00
HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97
Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6
Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0
MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3
Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska
On Wed, 2021-02-10 at 18:59 +0100, Thorsten Schöning wrote:
They will scan all partitions. You have to be prepared that most queries
will become at least slightly slower with partitioning. That is
expected.Does "most queries" address thos efor IDs scanning all partitions or
those time-based as well? In the end, I'm trying to improve query
performance by reducing the size of indexes, number of rows etc. per
table using partitions. :-)
The size of the index will barely affect its performance.
But having to figure out which partitions to scan takes planning time.
Very few queries become notably faster through partitioning.
Mostly those are aggregates and joins that match the partitioning scheme,
and sequential scans that include the partitioning key.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
On 2021-02-10 16:09:30 +0100, Thorsten Schöning wrote:
I have a table storing datagrams from some IoT-devices, with one
datagram per device per day most likely for around 75'000 devices
currently. I want to test query performance with a partitioned table
and am interested in the following queries mostly:* querying arbitrary datagrams by their ID
* querying datagrams being X old based on some timestamp
* 15 minutes
If you get one datagram per day, how it is useful to query all datagrams
received in the last 15 minutes? That's either a random sample (if
devices report at random times) or empty (if they all report at midnight
and it isn't just after midnight).
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Guten Tag Peter J. Holzer,
am Donnerstag, 11. Februar 2021 um 15:03 schrieben Sie:
If you get one datagram per day, how it is useful to query all datagrams
received in the last 15 minutes?[...]
There's a special installation mode during which users are setting up
their IoT-devices. During that mode datagrams might be received more
often, while newer ones are still deleting older ones. 15 minutes is
simply the default value in the corresponding UI to check if anything
has been received properly, but users might send+check more often.
15 days is some different default value regarding monitoring
IoT-devices, if they have been received at all and stuff like that. 15
months again is a value users are interested in, because they need
one measuring value per month most likely.
And 3 times 15 simply "looks" good as well... :-)
Mit freundlichen Grüßen
Thorsten Schöning
--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
E-Mail: Thorsten.Schoening@AM-SoFT.de
Web: http://www.AM-SoFT.de/
Telefon: 05151- 9468- 0
Telefon: 05151- 9468-55
Fax: 05151- 9468-88
Mobil: 0178-8 9468-04
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska
Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.
Mit freundlichen Grüßen
Thorsten Schöning
Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de
AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister für IT und TK
AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0
Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80
CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1
Büro Dallgow-Döberitz
Tel: 03322 507 020
Büro Kloster Lehnin
Tel: 033207 566 530
PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0
Büro Neuruppin
Tel: 033932 606 090
ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410
Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0
Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7
Büro Liebenwalde
Tel: 033054 810 00
HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97
Bitstore IT - Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6
Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0
MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3
Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska