Re: Pulling data from Postgres DB table for every 5 seconds.

Started by github kranabout 7 years ago15 messagesgeneral
Jump to latest
#1github kran
githubkran@gmail.com
Show quoted text

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
of DB size. In this DB we have a table PRODUCT_INFO with around 1 million
rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real
time for every 5 seconds from the DB ( Table mentioned above) and send it
to IOT topic whenever an event occurs for a product. ( event is any new
product information or change in the existing
product information.).

This table has few DML operations in real time either INSERT or UPDATE
based on the productId. ( Update whenever there is a change in the product
information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS
Aurora POSTGRES DB and used by clients.

*UseCase*
We dont want clients to pull the data for every 5 seconds from DB but
rather provide a service which can fetch the data from DB in real time and
push the data to IOT topic by pulling data for every 5 seconds from DB.

*Questions*
1) How can I get information by pulling from the DB every 5 seconds
without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5
seconds. Does POSTGRES has any other options apart from TRIGGER ?.

Any ideas would be helpful.

Thanks !!
GithubKran

#2Rob Sargent
robjsargent@gmail.com
In reply to: github kran (#1)

On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around 1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing
product information.).

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients.

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.

Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping. You don’t poll every 5seconds you react immediately to each event (with trigger or event). From the trigger perspective you probably have everything you need to update IOT with addition searching.

#3Ron
ronljohnsonjr@gmail.com
In reply to: github kran (#1)

On 1/9/19 11:02 AM, github kran wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4
TB of DB size. In this DB we have a table PRODUCT_INFO with around  1
million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in
real time for every 5 seconds from the DB

"the data".  All 1GB every 5 seconds?

( Table mentioned above) and send it to IOT topic whenever an event
occurs for a product. ( event is any new product information or change
in the existing
product information.).

This table has few DML operations in real time either INSERT or UPDATE
based on the productId. ( Update whenever there is a change in the
product information and INSERT when a record doesnt exists for that
product).

We have REST API's built in the backend pulling data from this backend
RDS Aurora POSTGRES DB and used by clients.

*_UseCase_*
We dont want clients to pull the data for every 5 seconds from DB but
rather provide a service which can fetch the data from DB in real time
and push the data to IOT topic by pulling data for every 5 seconds
from DB.

Or just a tiny subset every 5 seconds?

*_Questions_*
1) How can I get information by pulling from the DB every 5 seconds
without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every
5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.

Any ideas would be helpful.

Thanks !!
GithubKran

--
Angular momentum makes the world go 'round.

#4github kran
githubkran@gmail.com
In reply to: Rob Sargent (#2)

Thanks for your reply Rob. Reading the below documentation link says the
EVENT trigger is only supported for DDL commands. Is it not correct ?.

*1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
<https://www.postgresql.org/docs/9.6/event-trigger-definition.html&gt; *
(An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only
supported events are ddl_command_start, ddl_command_end, table_rewrite and
sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?.
Does it slow down if we are reading the data using the API when we have a
trigger in place ?.

Ron- Its a tiny subset of 1 GB Data for every 5 seconds but not on the
entire data.

Thanks !!.

On Wed, Jan 9, 2019 at 11:10 AM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On Jan 9, 2019, at 10:02 AM, github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
of DB size. In this DB we have a table PRODUCT_INFO with around 1 million
rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in
real time for every 5 seconds from the DB ( Table mentioned above) and send
it to IOT topic whenever an event occurs for a product. ( event is any new
product information or change in the existing
product information.).

This table has few DML operations in real time either INSERT or UPDATE
based on the productId. ( Update whenever there is a change in the product
information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend
RDS Aurora POSTGRES DB and used by clients.

*UseCase*
We dont want clients to pull the data for every 5 seconds from DB but
rather provide a service which can fetch the data from DB in real time and
push the data to IOT topic by pulling data for every 5 seconds from DB.

*Questions*
1) How can I get information by pulling from the DB every 5 seconds
without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5
seconds. Does POSTGRES has any other options apart from TRIGGER ?.

Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping. You don’t poll every 5seconds you react
immediately to each event (with trigger or event). From the trigger
perspective you probably have everything you need to update IOT with
addition searching.

#5Mark Fletcher
markf@corp.groups.io
In reply to: github kran (#1)

On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
of DB size. In this DB we have a table PRODUCT_INFO with around 1 million
rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in
real time for every 5 seconds from the DB ( Table mentioned above) and send
it to IOT topic whenever an event occurs for a product. ( event is any new
product information or change in the existing
product information.).

It's unclear whether you want to do table scans or if you're just looking
for changes to the database. If you're looking just for changes, consider
implementing something using logical replication. We have a logical
replication system set up to stream changes from the database into an
elastic search cluster, and it works great.

Mark

#6Rob Sargent
robjsargent@gmail.com
In reply to: github kran (#4)

On 1/9/19 10:21 AM, github kran wrote:

Thanks for your reply Rob. Reading the below documentation link says
the EVENT trigger is only supported for DDL commands. Is it not correct ?.

_1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html _
(An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only
supported events are ddl_command_start, ddl_command_end,
table_rewrite and sql_drop. Support for additional events may be added
in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the
table ?. Does it slow down if we are reading the data using the API
when we have a trigger in place ?.

Ah, right you are.  Are triggers off the table?  You would want to write
the trigger function in some (trusted?) language with access to the outside

#7github kran
githubkran@gmail.com
In reply to: Mark Fletcher (#5)

Mark - We are currently on 9.6 version of postgres and cant use this
feature of logical replication.Answering to your question we are looking
for any changes in the data related to a specific table ( changes like any
update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product
entity).
Any other alternatives ?.

On Wed, Jan 9, 2019 at 11:24 AM Mark Fletcher <markf@corp.groups.io> wrote:

Show quoted text

On Wed, Jan 9, 2019 at 9:02 AM github kran <githubkran@gmail.com> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4
TB of DB size. In this DB we have a table PRODUCT_INFO with around 1
million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in
real time for every 5 seconds from the DB ( Table mentioned above) and send
it to IOT topic whenever an event occurs for a product. ( event is any new
product information or change in the existing
product information.).

It's unclear whether you want to do table scans or if you're just looking
for changes to the database. If you're looking just for changes, consider
implementing something using logical replication. We have a logical
replication system set up to stream changes from the database into an
elastic search cluster, and it works great.

Mark

#8github kran
githubkran@gmail.com
In reply to: Rob Sargent (#6)

Rob - It's a Java based application. We dont have triggers yet on the
table and is trigger a only option in 9.6 version ?.

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:

Show quoted text

On 1/9/19 10:21 AM, github kran wrote:

Thanks for your reply Rob. Reading the below documentation link says the
EVENT trigger is only supported for DDL commands. Is it not correct ?.

*1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
<https://www.postgresql.org/docs/9.6/event-trigger-definition.html&gt; *
(An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only
supported events are ddl_command_start, ddl_command_end, table_rewrite
and sql_drop. Support for additional events may be added in future
releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table
?. Does it slow down if we are reading the data using the API when we have
a trigger in place ?.

Ah, right you are. Are triggers off the table? You would want to write
the trigger function in some (trusted?) language with access to the outside

#9Rob Sargent
robjsargent@gmail.com
In reply to: github kran (#8)

On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:

Rob - It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com <mailto:robjsargent@gmail.com>> wrote:

On 1/9/19 10:21 AM, github kran wrote:

Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html <https://www.postgresql.org/docs/9.6/event-trigger-definition.html&gt;
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_start, ddl_command_end, table_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.

Ah, right you are. Are triggers off the table? You would want to write the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total system? I can’t see them being more expensive than looking for changes every 5 seconds. If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed. I would have the trigger write to queue and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?

#10Mark Fletcher
markf@corp.groups.io
In reply to: github kran (#7)

On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:

Mark - We are currently on 9.6 version of postgres and cant use this
feature of logical replication.Answering to your question we are looking
for any changes in the data related to a specific table ( changes like any
update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product
entity).
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great.

Not sure about RDS Aurora, however.

Mark

#11github kran
githubkran@gmail.com
In reply to: Rob Sargent (#9)

On Wed, Jan 9, 2019 at 12:26 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Jan 9, 2019, at 11:11 AM, github kran <githubkran@gmail.com> wrote:

Rob - It's a Java based application. We dont have triggers yet on the
table and is trigger a only option in 9.6 version ?.

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <robjsargent@gmail.com> wrote:

On 1/9/19 10:21 AM, github kran wrote:

Thanks for your reply Rob. Reading the below documentation link says the
EVENT trigger is only supported for DDL commands. Is it not correct ?.

*1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
<https://www.postgresql.org/docs/9.6/event-trigger-definition.html&gt; *
(An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only
supported events are ddl_command_start, ddl_command_end, table_rewrite
and sql_drop. Support for additional events may be added in future
releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table
?. Does it slow down if we are reading the data using the API when we have
a trigger in place ?.

Ah, right you are. Are triggers off the table? You would want to write
the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total
system? I can’t see them being more expensive than looking for changes
every 5 seconds. If your hardware can scan 1T that quickly then I suspect
your trigger will not be noticed. I would have the trigger write to queue
and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can
be taught to emit the necessary details to IOT-thingy?

*Sure will try with a trigger and send data to a Queue and gather data for
every 5 seconds reading off from the queue and send to IOT topic. Already
we have a queue where every message inserted or updated on the table is
sent to a queue but **that is lot of data we are gathering. We want to
rather minimize collecting data from DB.*

#12github kran
githubkran@gmail.com
In reply to: Mark Fletcher (#10)

On Wed, Jan 9, 2019 at 12:36 PM Mark Fletcher <markf@corp.groups.io> wrote:

On Wed, Jan 9, 2019 at 10:10 AM github kran <githubkran@gmail.com> wrote:

Mark - We are currently on 9.6 version of postgres and cant use this
feature of logical replication.Answering to your question we are looking
for any changes in the data related to a specific table ( changes like any
update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product
entity).
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great.

Not sure about RDS Aurora, however.

Mark

Mark - just curious to know on the logical replication. Do you think I can
use it for my use case where i need to publish data to a subscriber when
there is a change in the data updated for a row or any new inserts
happening on the table. Intention
is to send this data in Json format by collecting this modified data in
real time to a subscriber.

Tahanks
Kran

#13Mark Fletcher
markf@corp.groups.io
In reply to: github kran (#12)

On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com> wrote:

Mark - just curious to know on the logical replication. Do you think I can
use it for my use case where i need to publish data to a subscriber when
there is a change in the data updated for a row or any new inserts
happening on the table. Intention
is to send this data in Json format by collecting this modified data in
real time to a subscriber.

From what you've said, it's a great use case for that feature. The one

thing to note is that you will have to code up a logical replication
client. If I can do it, pretty much anyone can, but it might take some time
to get things right. I wrote about some of what I found when developing our
client a year ago here:
https://wingedpig.com/2017/09/20/streaming-postgres-changes/

We ended up just using the included test output plugin that comes with the
postgresql distribution. And we didn't end up streaming to Kafka or
anything else first. We just take the data and insert it into our
elasticsearch cluster directly as we get it.

Mark

#14Martín Marqués
martin@2ndquadrant.com
In reply to: Mark Fletcher (#13)

El 9/1/19 a las 20:22, Mark Fletcher escribió:

On Wed, Jan 9, 2019 at 12:58 PM github kran <githubkran@gmail.com
<mailto:githubkran@gmail.com>> wrote:

Mark - just curious to know on the logical replication. Do you think
I can use it for my use case where i need to publish data to a
subscriber when there is a change in the data updated for a row or
any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data
in real time to a subscriber.

From what you've said, it's a great use case for that feature. The one
thing to note is that you will have to code up a logical replication
client. If I can do it, pretty much anyone can, but it might take some
time to get things right. I wrote about some of what I found when
developing our client a year ago
here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/

We ended up just using the included test output plugin that comes with
the postgresql distribution. And we didn't end up streaming to Kafka or
anything else first. We just take the data and insert it into our
elasticsearch cluster directly as we get it.

I realy doubt that would work. Aurora doesn't have WALs, so how would
you be able to decode the transactions?

AFAIU, you can't use logical decoding on Aurora. Maybe you should be
asking at the Aurora support channel.

Regards,

--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Tony Shelver
tshelver@gmail.com
In reply to: github kran (#1)

I am not familiar with Aurora, but..

What something like https://github.com/subzerocloud/pg-amqp-bridge?

Set up a message queue in Postgres, which calls into AMPQ (RabbitMQ) to
send a message for consumption by one or more clients.

This provides a function that can be called from a trigger to send the
message.

After that, you have all the goodness of a standards-based open source MQ
platform to distribute your data / notifications.

On Wed, 9 Jan 2019 at 19:02, github kran <githubkran@gmail.com> wrote:

Show quoted text

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
of DB size. In this DB we have a table PRODUCT_INFO with around 1 million
rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in
real time for every 5 seconds from the DB ( Table mentioned above) and send
it to IOT topic whenever an event occurs for a product. ( event is any new
product information or change in the existing
product information.).

This table has few DML operations in real time either INSERT or UPDATE
based on the productId. ( Update whenever there is a change in the product
information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend
RDS Aurora POSTGRES DB and used by clients.

*UseCase*
We dont want clients to pull the data for every 5 seconds from DB but
rather provide a service which can fetch the data from DB in real time and
push the data to IOT topic by pulling data for every 5 seconds from DB.

*Questions*
1) How can I get information by pulling from the DB every 5 seconds
without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5
seconds. Does POSTGRES has any other options apart from TRIGGER ?.

Any ideas would be helpful.

Thanks !!
GithubKran