Incremental aggregate/rollup strategy advice
I'm researching strategies for incrementally updating aggregate/rollup
tables. The problem is how to do so without double-counting changes, and
not skipping changes. I know enough about concurrency issues to ask the
question, but do *not* know enough about the features and details of
Postgres' concurrency management to figure out a 100% reliable solution
without some help. And, with concurrency-related stuff, you're either 100%
right or you're buggy.
And *thanks in advance* to anyone who can help out. I'm not good at writing
short :( I've tried to put in enough detail to get to the point, which is
"how do I find unprocessed records without missing any."
Okay, the setup is that we've got a lot of tables where we would like to do
incremental aggregates. To simplify things, mostly these are INSERT-only
tables, sometimes UPDATE, not worrying about DELETE yet. A couple of
strategies I'd like to avoid:
* Full queries will take too long, and will scale poorly. So, MATERIALIZED
VIEW is unappealing. So, rollup tables as it's possible to update them
incrementally.
* We may have multiple aggregates off the same base data, and may change
them over time. So, putting some kind of flag field in the source table
doesn't really fit.
* I was thinking about a posting/diff/delta/audit-like table, but that's a
pretty "heavy" solution. You need some kind of ON AFTER INSERT/UPDATE
selection-based trigger to push over the data that's needed to update the
aggregates. Which, again, means the source table needs to know what
aggregations are going to take place. Plus, it's just a ton of churn and
extra data...when all of necessary data exists in the source table already.
* I saw one strategy that looks good from the folks at CitusData:
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
Briefly, they use a bigserial counter which, I guess, is not
transaction-bound so that record insertions have a chronological stamp. 1,
2, 3, etc. This is a design familiar to me from other environments and is
sometimes called a "concurrency ID." In our case, we need to support UPDATE
as well, so I don't think the sequence idea will work (?) To make this more
concrete, here's a simplified table with source data:
CREATE TABLE "error_report" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We've got
distributed sources, so UUIDs for IDs.
"error_name" text NOT NULL DEFAULT false, -- Something we'll
summarize by.
"facility_id" uuid NOT NULL DEFAULT NULL, -- Something we'll
summarize by.
"error_dts" timestamptz NOT NULL DEFAULT NULL, -- Set on the
source machine in UTC
"last_updated_dts" timestamptz NOT NULL DEFAULT NULL); -- Set on Postgres
after INSERT or UPDATE.
The idea is that you have a stable number line as a number or a timestamp.
We use timestamptz and store everything in UTC. Otherwise, it's the same
basic idea as what the CitusData folks said: You have an ever-increasing
number line so that you can mark where you've processed to. This way, you
can fetch unprocessed rows without missing any, without a flag field the
source table, and without an audit table/change queue of any kind. I've
simplified the timestamps below for legibility to spell this out, as it's
the crux of my question about Postgres specifics. And, just pretend that
these rows are all on page 0...I've faked ctid values to make the rows
easier to keep track of.
ctid last_updated_dts
(0,1) 2018-09-25 05:53:00
(0,2) 2018-09-25 05:54:00
(0,3) 2018-09-25 05:55:00
(0,3) 2018-09-25 05:55:00
(0,4) 2018-09-26 02:23:00
(0,5) 2018-09-26 03:14:00
(0,6) 2018-09-26 03:15:00
(0,7) 2018-09-28 05:10:00
(0,8) 2018-09-28 05:14:00
(0,9) 2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00
You need a small utility table to hold details about which records you've
aggregated or processed.
CREATE TABLE "rollup_status" (
"id" uuid NOT NULL DEFAULT extensions.gen_random_uuid(), -- We use UUIDs,
not necessary here, but it's what we use.
"rollup_name" text NOT NULL DEFAULT false,
"last_processed_dts" timestamptz NOT NULL DEFAULT NULL); -- Marks the last
timestamp processed.
Now imagine that I've got a rollup_status record
rollup_name last_processed_dts
error_name_counts 2018-09-26 02:23:00
If I search for rows that were modified after the "processed until", I get
these:
ctid last_updated_dts
(0,5) 2018-09-26 03:14:00
(0,6) 2018-09-26 03:15:00
(0,7) 2018-09-28 05:10:00
(0,8) 2018-09-28 05:14:00
(0,9) 2018-09-28 05:15:00
(0,10) 2018-09-28 05:15:00
And update the max(last_updated_dts) in the rollup_detail record:
rollup_name last_processed_dts
error_name_counts 2018-09-28 05:15:00
So, I got a chunk of the timeline, recorded how far I went, and processed
those records. The beauty part of this technique, if I can get it
implemented correctly, is that this doesn't have to block new records.
While I'm processing those 5 (or 5K), new records can be added onto the end
of error_report and, so long as they have a timestamp later than
rollup_detail.last_processed_dts, I'll find them on the next sweep. And if
I want to add a different rollup? There's no need to do _anything_ to the
error_report table. Imagine two new rows have come in to the error_report
table
(0,11) 2018-09-28 05:17:00
(0,12) 2018-09-28 05:18:00
The new rollup runs, processes all 12 rows, and now there are two
rollup_detail entries:
rollup_name last_processed_dts
error_name_counts 2018-09-28 05:15:00
facility_avgs 2018-09-28 05:18:00
Same timeline, different uses, different last-position-processed stored in
rollup_detail.last_processed_dts.
Okay, this finally brings me to the Postgres-specific question.
* How do I make sure that I don't have transactions commit with earlier
timestamps?*
I've been digging into this and have seen and considered a few things, but
I'm not sure which, if any will work.
* Use an ON AFTER trigger on the selection with a transition table (cool
feature!) to stamp the last_updated_dts with clock_timestamp(). That
function in particular as I've now learned that most timestamp functions in
a trigger return the transaction _start_ time, not _commit_ time. I need
the transaction commit time. Which brings us to the next idea.
* pg_xact_commit_timestamp which, if I understand it correctly, is sort of
an additional, invisible system column that stores the transaction commit
timestamp as a timestamptz. That sounds perfectly matched to my
requirements but:
-- I've failed to find _any_ detailed discussion of this feature, which is
unusual with Postgres.
-- I'm not 100% sure it will work. Do I need some kind of lock notification
or something while going the aggregate to button things down?
-- I'm not wild about invisible columns as they're obscure to anyone else.
-- I'm not wild about bolting an extra 8-bytes onto every row in every
table. And does this field index? I think it must.
-- I've seen mention that the timestamps aren't retained.
I don't need the timestamp as such, it's just meant to order things along
the number line. I'll put a BRIN index on error_report.last_updated_dts
(and such fields in other tables.) I don't think I would need an index on
the pg_xact_commit_timestamp value, and it's not directly supported. But if
it is necessary, I guess you could build one with a cast AT TIME ZONE to
convert the value to something immutable for the index engine.
* The xmin system column. I don't think that this would work. While
transaction IDs always increase, they aren't going to commit in that order.
* The ctid system column. I've seen this mentioned, but I honestly don't
follow how you could use this to reliably detect all new and modified rows.
I will be *very grateful* for any help or suggestions. I'm out over my skis
on some of the details on Postgres specifics, so I *won't* take corrections
the wrong way.
* Full queries will take too long, and will scale poorly. So, MATERIALIZED
VIEW is unappealing. So, rollup tables as it's possible to update them
incrementally.
F.Y.I. There is a proposal to implemnt incremental updation against
MATERIALIZED VIEW. It is still in WIP patch but currently it supports
count and sum.
/messages/by-id/20190628195620.c306e3003a83bb85a12f54c5@sraoss.co.jp
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
Tatsuo,
Thank you for your response, I have followed the discussion on Hackers with
interest. I hope that your efforts are a great success! In my case, I need
to find a solution available in shipping versions of Postgres. But, since
you've joined in, I'm curious: What is the advantage of a materialized view
over a real table? It seems like the update semantics and mechanics are
more straightforward with a table.
There are a couple of extensions that might help you:
PipelineDB[1]https://www.pipelinedb.com/: Their “Continuous Views” could be useful. A big caveat here is that PipelineDB’s future is uncertain[2]https://www.pipelinedb.com/blog/pipelinedb-is-joining-confluent, though.
I haven’t used it myself, but it looks like you could put triggers onto your existing tables to insert data in PipelinedDB “Streams”, then build whatever continuous views are needed around those streams.
The other is TimescaleDB[3]https://www.timescale.com/, which has “Continuous Aggregates”, but they are fairly new and currently have some limitation.
Steve.
[1]: https://www.pipelinedb.com/
[2]: https://www.pipelinedb.com/blog/pipelinedb-is-joining-confluent
[3]: https://www.timescale.com/
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message.
The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP.
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations.
If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us.
Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy.
Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice.
Thanks Steven, nice suggestions. I should have mentioned that the
deployment setup is on RDS on PG 11.x, which rules out those extensions.
I've looked at TimescaleDB several times, and it looks pretty great.
I've now read through some of the archives from years back when
pg_xact_commit_timestamp was still in development, and I'm thinking it
might be the right solution. I'm still not clear how long timestamps are
held, or what the overhead is. It sounds like commit timestamps might be
exactly the same, but that's fine for me. So long as they're never in the
past, it doesn't matter how many timestamps are the same.
Thank you for your response, I have followed the discussion on Hackers with
interest. I hope that your efforts are a great success! In my case, I need
to find a solution available in shipping versions of Postgres. But, since
you've joined in, I'm curious: What is the advantage of a materialized view
over a real table? It seems like the update semantics and mechanics are
more straightforward with a table.
In my understanding, views and materialized views provide users more
flexible and easy way to access base tables. In RDB, base tables are
usually heavily normalized and may not be easy for applications to
extract information. By defining views, apps would have convenient and
intuitive way to get information from base tables.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp