How you make efficient design for CDC and book marking

Started by Lok Palmost 2 years ago2 messagesgeneral
Jump to latest
#1Lok P
loknath.73@gmail.com

Hello,
My understanding is that the replication tools normally rely on the
database transaction logs to find the CDC/delta
changes(Insert/Update/Delete) for tables and then move those delta changes
to the target system/databases. Whatever may be the source database (It
might be open source postgres or aws RDS). And never done though, but i am
assuming , manually scanning the DB logs must not be easy ones and also
might not be given access to DB logs because of security reasons too. Hope
my understanding is correct here.

Thus, in absence of such replication tool(may be because of the additional
cost associated etc) if someone wants to find the delta changes
(insert/update/delete) in a database as efficiently as possible and move
those to the target database in a continuous data streaming setup, I can
only think of below option....

i.e maintaining audit columns like create_timestamp/update_timestamp
columns in every source table so that they can be utilized to get the CDC
for Insert and Update statements and also for bookmarking. But to find the
delta for the deletes , there is not much option but to have row level
triggers created on the base table which will populate another audit table
with the deleted rows, and this is going to crawl if we get a lot of
deletes(in millions) on the source tables.

Want to know from experts, if there exists any other way to have these
manual CDC and book marking more efficient for such continuous delta data
movement scenarios?

Regards
Lok

#2yudhi s
learnerdatabase99@gmail.com
In reply to: Lok P (#1)
Re: How you make efficient design for CDC and book marking

If you are worried about the audit trigger and also an additional audit
table to hold all the rows , then you may get an additional flag added to
your base table deleted_flag and when delete happens, you just need to
update that flag through your code manually, but not physically delete the
record from the base table (if you want to avoid the performance impact of
the row trigger on the base tables). So it will make the main table bulky
but will have lesser impact to your incoming data load to base or main
tables.

On Fri, Apr 26, 2024 at 1:56 AM Lok P <loknath.73@gmail.com> wrote:

Show quoted text

Hello,
My understanding is that the replication tools normally rely on the
database transaction logs to find the CDC/delta
changes(Insert/Update/Delete) for tables and then move those delta changes
to the target system/databases. Whatever may be the source database (It
might be open source postgres or aws RDS). And never done though, but i am
assuming , manually scanning the DB logs must not be easy ones and also
might not be given access to DB logs because of security reasons too. Hope
my understanding is correct here.

Thus, in absence of such replication tool(may be because of the additional
cost associated etc) if someone wants to find the delta changes
(insert/update/delete) in a database as efficiently as possible and move
those to the target database in a continuous data streaming setup, I can
only think of below option....

i.e maintaining audit columns like create_timestamp/update_timestamp
columns in every source table so that they can be utilized to get the CDC
for Insert and Update statements and also for bookmarking. But to find the
delta for the deletes , there is not much option but to have row level
triggers created on the base table which will populate another audit table
with the deleted rows, and this is going to crawl if we get a lot of
deletes(in millions) on the source tables.

Want to know from experts, if there exists any other way to have these
manual CDC and book marking more efficient for such continuous delta data
movement scenarios?

Regards
Lok