Choosing an index on partitioned tables.

Started by Tim Uckunover 4 years ago12 messagesgeneral
Jump to latest
#1Tim Uckun
timuckun@gmail.com

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

#2Brent Wood
Brent.Wood@niwa.co.nz
In reply to: Tim Uckun (#1)
Re: Choosing an index on partitioned tables.

Hi Tim,

I've had good success with TimescaleDB for large timesries databases (40b readings).
https://www.timescale.com/

You turn your timestamp table into a Timescale hypertable and it looks after the indexing and partitioning automatically, with the table accessed like a normal postgres table, but very quickly.

It also adds some SQL functions to add a bit of time based query functionality.

Cheers

Brent Wood

Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529

________________________________
From: Tim Uckun <timuckun@gmail.com>
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

[https://www.niwa.co.nz/static/niwa-2018-horizontal-180.png] <https://www.niwa.co.nz&gt;
Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529

National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
Connect with NIWA: niwa.co.nz<https://www.niwa.co.nz&gt; Facebook<https://www.facebook.com/nzniwa&gt; LinkedIn<https://www.linkedin.com/company/niwa&gt; Twitter<https://twitter.com/niwa_nz&gt; Instagram<https://www.instagram.com/niwa_science&gt;
To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems

#3Tim Uckun
timuckun@gmail.com
In reply to: Brent Wood (#2)
Re: Choosing an index on partitioned tables.

Hi Brent.

I looked at timescaledb. It does make partitioning on date ranges
automatic which is awesome and as you said it does add a couple of extra
features on top of postgres but their cloud offering are much more
expensive than buying a generic postgres instance from AWS. A generic
t3.medium on amazon costs ~90 dollars per month in singapore and ~140 per
month from timescale.

Is it really worth the extra expense?

On Tue, Sep 7, 2021 at 4:06 PM Brent Wood <Brent.Wood@niwa.co.nz> wrote:

Show quoted text

Hi Tim,

I've had good success with TimescaleDB for large timesries databases (40b
readings).
https://www.timescale.com/

You turn your timestamp table into a Timescale hypertable and it looks
after the indexing and partitioning automatically, with the table accessed
like a normal postgres table, but very quickly.

It also adds some SQL functions to add a bit of time based query
functionality.

Cheers

Brent Wood

Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529

------------------------------
*From:* Tim Uckun <timuckun@gmail.com>
*Sent:* Tuesday, September 7, 2021 15:44
*To:* pgsql-general <pgsql-general@postgresql.org>
*Subject:* Choosing an index on partitioned tables.

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

<https://www.niwa.co.nz&gt;
Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529

National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
*Connect with NIWA:* niwa.co.nz <https://www.niwa.co.nz&gt; Facebook
<https://www.facebook.com/nzniwa&gt; LinkedIn
<https://www.linkedin.com/company/niwa&gt; Twitter
<https://twitter.com/niwa_nz&gt; Instagram
<https://www.instagram.com/niwa_science&gt; To ensure compliance with legal
requirements and to maintain cyber security standards, NIWA's IT systems
are subject to ongoing monitoring, activity logging and auditing. This
monitoring and auditing service may be provided by third parties. Such
third parties can access information transmitted to, processed by and
stored on NIWA's IT systems

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tim Uckun (#1)
Re: Choosing an index on partitioned tables.

On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote:

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column  As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

If you don't need to speed up searches by "id", you could define
the primary key on (timestamp_col, id), which can be used to speed
up searches by the timestamp column without defining an extra index.

I would choose a B-tree index for the metrics column.
With the B-tree deduplication feature added in v13, the index will
be small, and I doubt that hash indexes would perform much better.

If there is a dominant value, you could consider a partial index
that excludes that value.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Brent Wood (#2)
Re: Choosing an index on partitioned tables.

On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote:

From: Tim Uckun <timuckun@gmail.com>

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column  As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

I've had good success with TimescaleDB for large timesries databases (40b readings).

That has nothing to do with indexing, and I would think twice to install
an invasive extension like that and add a dependency on third-party code,
just because I want to partition a table.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#6Tim Uckun
timuckun@gmail.com
In reply to: Laurenz Albe (#4)
Re: Choosing an index on partitioned tables.

Thanks!

That's great about the Btree deduplication feature in 13.

Show quoted text

On Tue, Sep 7, 2021 at 7:21 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote:

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

If you don't need to speed up searches by "id", you could define
the primary key on (timestamp_col, id), which can be used to speed
up searches by the timestamp column without defining an extra index.

I would choose a B-tree index for the metrics column.
With the B-tree deduplication feature added in v13, the index will
be small, and I doubt that hash indexes would perform much better.

If there is a dominant value, you could consider a partial index
that excludes that value.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#7Tim Uckun
timuckun@gmail.com
In reply to: Laurenz Albe (#5)
Re: Choosing an index on partitioned tables.

To be fair Timescale also adds some other features which might be
useful. For example they add some SQL enhancements like last value
and auto maintaining materialized views and such. The automatic
management of partitions is also pretty big plus in my opinion. You
can get some of the equivalent functionality by writing stored procs
and deploying pg_cron but it's nice to have those things built in.

It's open source so you can just deploy their docker image which I did
for development but in the end I wanted to try and do the same thing
in plain jane postgres.

Show quoted text

On Tue, Sep 7, 2021 at 7:24 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote:

From: Tim Uckun <timuckun@gmail.com>

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

I've had good success with TimescaleDB for large timesries databases (40b readings).

That has nothing to do with indexing, and I would think twice to install
an invasive extension like that and add a dependency on third-party code,
just because I want to partition a table.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#8Magnus Hagander
magnus@hagander.net
In reply to: Tim Uckun (#7)
Re: Choosing an index on partitioned tables.

On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote:

To be fair Timescale also adds some other features which might be
useful. For example they add some SQL enhancements like last value
and auto maintaining materialized views and such. The automatic
management of partitions is also pretty big plus in my opinion. You
can get some of the equivalent functionality by writing stored procs
and deploying pg_cron but it's nice to have those things built in.

If you want automatic partition management, look at pg_partman. No
need to write your own procs and deploy with cron.

And FWIW, in reference to the discussions about AWS, it is supported on RDS.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#9Tim Uckun
timuckun@gmail.com
In reply to: Magnus Hagander (#8)
Re: Choosing an index on partitioned tables.

It's weird that it's supported on AWS and many other providers but not
in the official docker images.

Show quoted text

On Tue, Sep 7, 2021 at 9:16 PM Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote:

To be fair Timescale also adds some other features which might be
useful. For example they add some SQL enhancements like last value
and auto maintaining materialized views and such. The automatic
management of partitions is also pretty big plus in my opinion. You
can get some of the equivalent functionality by writing stored procs
and deploying pg_cron but it's nice to have those things built in.

If you want automatic partition management, look at pg_partman. No
need to write your own procs and deploy with cron.

And FWIW, in reference to the discussions about AWS, it is supported on RDS.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#10Magnus Hagander
magnus@hagander.net
In reply to: Tim Uckun (#9)
Re: Choosing an index on partitioned tables.

On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote:

It's weird that it's supported on AWS and many other providers but not
in the official docker images.

That'd be something to talk to the docker people about I guess? There
are no official docker images published by *postgresql*. (And of
course, AWS or Azure or whomever do whatever they want, but I assume
they're including pg_partman because it's a very popular extension)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#11Tim Uckun
timuckun@gmail.com
In reply to: Magnus Hagander (#10)
Re: Choosing an index on partitioned tables.

There is an image marked as official: https://hub.docker.com/_/postgres

I presumed this was maintained by the postgres team.

Show quoted text

On Tue, Sep 7, 2021 at 9:59 PM Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote:

It's weird that it's supported on AWS and many other providers but not
in the official docker images.

That'd be something to talk to the docker people about I guess? There
are no official docker images published by *postgresql*. (And of
course, AWS or Azure or whomever do whatever they want, but I assume
they're including pg_partman because it's a very popular extension)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#12Magnus Hagander
magnus@hagander.net
In reply to: Tim Uckun (#11)
Re: Choosing an index on partitioned tables.

On Tue, Sep 7, 2021 at 12:15 PM Tim Uckun <timuckun@gmail.com> wrote:

There is an image marked as official: https://hub.docker.com/_/postgres

I presumed this was maintained by the postgres team.

It is official *docker*, just not official *postgresql*. If you click
their "maintained by" link you get to
https://github.com/docker-library/postgres which clearly explains what
"type of official" it is.

That said, as long as you use the debian based version of their
container, it should be trivial to add any extension that's supported
on Debian, which definitely includes pg_partman.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/