RE: Handling time series data with PostgreSQL

Started by Jayaramover 5 years ago7 messagesgeneral
Jump to latest
#1Jayaram
jairamcbe@gmail.com

Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock
market based application which deals 80% of data with time data. We are in
the process of choosing the right database for the requirement especially
for time series data. After all multiple investigations, I found PostgreSQL
with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to
handle our requirements. It will be helpful to me if anyone can suggest
implementing the time series concepts in PostgreSQL database.
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--

*Thanks & Regards,Jayaram S,Banglore.India.*

#2Adalberto Caccia
adacaccia@gmail.com
In reply to: Jayaram (#1)
Re: Handling time series data with PostgreSQL

Hi,
TimescaleDB as a Postgresql extension has been used in my firm for two
years now, I've recently managed to upgrade it from pg10 to pg12 and from
discrete VM's to Kubernetes as well.
Frankly speaking, being new to TimescaleDB at that time, I've found it easy
to manage, easy to scale (it's 100% compatible with pg replication,
unfortunately not the logical one, yet...), easy to install, easy to
upgrade... what else?
From a developer's perspective, it just adds "superpowers" to ordinary PG
tables, all under the hood. On disk, it features a "chunked" layout, where
each chunk belongs to a definite "time" range; and of course the "time"
column on which to index time data is just passed as a parameter to the
call to TimescaleDB, for each table on which you need such power.
At the moment, we're also using it for time aggregate calculations, but
only for the coarse ones (30m --> 1h and 1h --> 1 day), while we're still
handling the finer ones (1s --> 1m and so on) in Kafka+Flink, which is a
common scenario for a streaming data platform, anyway.

Regards,
Adalberto

Il giorno mar 6 ott 2020 alle ore 11:47 Jayaram <jairamcbe@gmail.com> ha
scritto:

Show quoted text

Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock
market based application which deals 80% of data with time data. We are in
the process of choosing the right database for the requirement especially
for time series data. After all multiple investigations, I found PostgreSQL
with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to
handle our requirements. It will be helpful to me if anyone can suggest
implementing the time series concepts in PostgreSQL database.
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--

*Thanks & Regards,Jayaram S,Banglore.India.*

#3Jayaram
jairamcbe@gmail.com
In reply to: Adalberto Caccia (#2)
Re: Handling time series data with PostgreSQL

Hi Adalberto,

Awesome.!! Thanks for your reply.

So, Do we need the timescaleDB as mandatory to handle time series data? Is
there any way to handle hourly to days,months,yearly data with PGSQL alone
without timescale addon?
Ours is a new project and we are unsure about whether we should have both
timescaleDB and PGSQL or PGSQL alone is capable of handling this time
series data by tuning the right indexes.etc..

What we are planning is to start with PGSQL alone (For handling
hourly,daily,monthly,weekly,yearly calculations) and later when we add
seconds and minutes, we will include the timescaleDB. Is this the right
approach?
Is PGSQL alone capable of meeting this requirement?

Thank you in advance.

Regards,
Jayaram S.

On Wed, Oct 7, 2020 at 1:34 AM Adalberto Caccia <adacaccia@gmail.com> wrote:

Hi,
TimescaleDB as a Postgresql extension has been used in my firm for two
years now, I've recently managed to upgrade it from pg10 to pg12 and from
discrete VM's to Kubernetes as well.
Frankly speaking, being new to TimescaleDB at that time, I've found it
easy to manage, easy to scale (it's 100% compatible with pg replication,
unfortunately not the logical one, yet...), easy to install, easy to
upgrade... what else?
From a developer's perspective, it just adds "superpowers" to ordinary PG
tables, all under the hood. On disk, it features a "chunked" layout, where
each chunk belongs to a definite "time" range; and of course the "time"
column on which to index time data is just passed as a parameter to the
call to TimescaleDB, for each table on which you need such power.
At the moment, we're also using it for time aggregate calculations, but
only for the coarse ones (30m --> 1h and 1h --> 1 day), while we're still
handling the finer ones (1s --> 1m and so on) in Kafka+Flink, which is a
common scenario for a streaming data platform, anyway.

Regards,
Adalberto

Il giorno mar 6 ott 2020 alle ore 11:47 Jayaram <jairamcbe@gmail.com> ha
scritto:

Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock
market based application which deals 80% of data with time data. We are in
the process of choosing the right database for the requirement especially
for time series data. After all multiple investigations, I found PostgreSQL
with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to
handle our requirements. It will be helpful to me if anyone can suggest
implementing the time series concepts in PostgreSQL database.
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--

*Thanks & Regards,Jayaram S,Banglore.India.*

--

*Thanks & Regards,Jayaram S,Banglore.Mobile: 91-7760951366.*

#4Stephen Frost
sfrost@snowman.net
In reply to: Jayaram (#3)
Re: Handling time series data with PostgreSQL

Greetings,

* Jayaram (jairamcbe@gmail.com) wrote:

So, Do we need the timescaleDB as mandatory to handle time series data? Is
there any way to handle hourly to days,months,yearly data with PGSQL alone
without timescale addon?

Certainly there is and a lot of people do it- what isn't clear is what
it is you feel is missing from PG when it comes to handling time series
data..? Generally speaking there's concerns about PG's ability to
handle lots of partitions (which comes from there being very large
amounts of data being stored), but v12 and v13 have made great
improvements in that area and it's not nearly an issue any longer (and
performs better in quite a few cases than extensions).

Ours is a new project and we are unsure about whether we should have both
timescaleDB and PGSQL or PGSQL alone is capable of handling this time
series data by tuning the right indexes.etc..

Partitioning and index tuning in PG (look at using BRIN if you haven't
already...) is important when you get to larger data volumes.

Thanks,

Stephen

#5Mark Johnson
remi9898@gmail.com
In reply to: Stephen Frost (#4)
Re: Handling time series data with PostgreSQL

I think the OP may be referring to Oracle's Temporal Validity feature.
This type of feature has yet to be implemented in PostgreSQL (see
https://www.postgresql.org/docs/13/unsupported-features-sql-standard.html
item T181).

Temporal Validity allows you to add a time dimension to any table, and only
display rows of data that are valid for the requested time period.
Oracle's implementation of Temporal Validity uses the PERIOD FOR clause in
CREATE TABLE, ALTER TABLE, and SELECT statements as illustrated below:

CREATE TABLE EMPLOYEE
(
ID NUMBER PRIMARY KEY,
TAX_ID VARCHAR2(10),
HIRE_DATE TIMESTAMP,
TERM_DATE TIMESTAMP,
PERIOD FOR EMP_VALID_TIME (HIRE_DATE, TERM_DATE)
);

SELECT * FROM EMPLOYEE
VERSIONS PERIOD FOR EMP_VALID_TIME
BETWEEN TO_TIMESTAMP('06-OCT-2013', 'DD-MON-YYYY')
AND TO_TIMESTAMP('31-OCT-2013', 'DD-MON-YYYY');

ID TAX_ID HIRE_DATE TERM_DATE
----- ---------- ------------------------------
------------------------------
1 123456789 06-OCT-13 12.00.00.000000 AM 07-NOV-15 12.00.00.000000 AM
2 222456789 07-OCT-13 12.00.00.000000 AM
4 444004444
5 505050505 30-OCT-13 12.00.00.000000 AM 31-OCT-13 12.00.00.000000 AM
6 666999666 30-SEP-13 12.00.00.000000 AM 31-DEC-13 12.00.00.000000 AM

The above feature requires Oracle 12 or higher. SQL Server 2016 and later
also support it. In earlier releases of each DBMS we tried to accomplish
the same by adding pairs of timestamp columns to each table and then
writing our own code to handle row filtering. Partitioning isn't needed.
Certainly partitioning by range could be used, but it would still require
some manual efforts.

-Mark

On Wed, Oct 7, 2020 at 10:41 AM Stephen Frost <sfrost@snowman.net> wrote:

Show quoted text

Greetings,

* Jayaram (jairamcbe@gmail.com) wrote:

So, Do we need the timescaleDB as mandatory to handle time series data?

Is

there any way to handle hourly to days,months,yearly data with PGSQL

alone

without timescale addon?

Certainly there is and a lot of people do it- what isn't clear is what
it is you feel is missing from PG when it comes to handling time series
data..? Generally speaking there's concerns about PG's ability to
handle lots of partitions (which comes from there being very large
amounts of data being stored), but v12 and v13 have made great
improvements in that area and it's not nearly an issue any longer (and
performs better in quite a few cases than extensions).

Ours is a new project and we are unsure about whether we should have both
timescaleDB and PGSQL or PGSQL alone is capable of handling this time
series data by tuning the right indexes.etc..

Partitioning and index tuning in PG (look at using BRIN if you haven't
already...) is important when you get to larger data volumes.

Thanks,

Stephen

#6Stephen Frost
sfrost@snowman.net
In reply to: Mark Johnson (#5)
Re: Handling time series data with PostgreSQL

Greetings,

* Mark Johnson (remi9898@gmail.com) wrote:

I think the OP may be referring to Oracle's Temporal Validity feature.

Perhaps, but that's not the only way to manage time series data.

[ ... ] In earlier releases of each DBMS we tried to accomplish
the same by adding pairs of timestamp columns to each table and then
writing our own code to handle row filtering. Partitioning isn't needed.
Certainly partitioning by range could be used, but it would still require
some manual efforts.

I've found that using the range data types can work quite will, with
overlaps queries, to manage time-series data instead of using pairs of
timestamp columns. With range data types you can also create exclusion
constraints to ensure that you don't end up introducing overlapping
ranges.

Either way require adjusting your queries though, no? And inserting and
maintaining the data..? I can appreciate wanting to be standards
compliant but this specific use-case doesn't really provide much
justification for using this particular feature. Perhaps there are
better ones.

Thanks,

Stephen

#7Jayaram
jairamcbe@gmail.com
In reply to: Stephen Frost (#6)
Re: Handling time series data with PostgreSQL

Thank you Stephen, Mark for your inputs. I will go through the
documentation for time series data best practices in PGSQL.

Regards,
Jayaram S.

On Thu, Oct 8, 2020 at 1:57 AM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Mark Johnson (remi9898@gmail.com) wrote:

I think the OP may be referring to Oracle's Temporal Validity feature.

Perhaps, but that's not the only way to manage time series data.

[ ... ] In earlier releases of each DBMS we tried to accomplish
the same by adding pairs of timestamp columns to each table and then
writing our own code to handle row filtering. Partitioning isn't needed.
Certainly partitioning by range could be used, but it would still require
some manual efforts.

I've found that using the range data types can work quite will, with
overlaps queries, to manage time-series data instead of using pairs of
timestamp columns. With range data types you can also create exclusion
constraints to ensure that you don't end up introducing overlapping
ranges.

Either way require adjusting your queries though, no? And inserting and
maintaining the data..? I can appreciate wanting to be standards
compliant but this specific use-case doesn't really provide much
justification for using this particular feature. Perhaps there are
better ones.

Thanks,

Stephen

--

*Thanks & Regards,Jayaram S,Banglore.Mobile: 91-7760951366.*